In November 2022, Argentina walked into their World Cup opener on a 36-game unbeaten streak and lost 2-1 to Saudi Arabia, a team one AI model gave a 0.06% chance of winning the tournament. Saudi Arabia took three shots the entire game, and the only two on target both went in. Three weeks later, Argentina won the World Cup. Every other match, they won. If you want to predict the World Cup winner, start with that paradox.
So which result was the models’ failure? The answer is neither, and that answer is the whole subject of this post. A predictive model does not tell you who will win. It tells you how probable each outcome is, and a probability is not a promise. The best teams win over time, but in any given World Cup, chaos has a significant vote. If you take one thing from this post, take that sentence, because it changes how you should read every prediction graphic you see this summer.
This is the second post in our series on AI and machine learning at the 2026 World Cup, and like the first, it asks what it really means to predict the World Cup winner. It sits on the off-the-pitch side of the framework from Post 1. This technology is not helping any team win. It is changing how the rest of us watch.
What the Models Say About 2026
The most cited model in soccer is the Opta supercomputer, a statistical model from the sports data company Opta that simulated this tournament 25,000 times before kickoff. Spain won in 16.1% of those simulations, making them the favorite, with France, England, and Argentina each above 10%. Down at the other end, 28 teams came in below 1%, and one team, Curaçao, won the tournament in zero of 25,000 simulated worlds. Haiti, to put the scale of the longshots in perspective, won exactly once.
Read that headline number again, though. The favorite wins 16.1% of the time. That means the model’s single strongest claim about this tournament is that Spain will probably not win it. An 84% chance of someone else is not a hedge or a cop-out. It is the model being honest about a sport that resists certainty, and no team in the field clears 20%.
I asked Finn McCallum, the soccer player and coach from my previous post, for his honest pick, blind to any model output. He said Spain, with France right behind, and added that most people who watch the sport would say the same. The fan and the supercomputer agree almost exactly, and they got there independently, which raises a fair question: if the model and an informed fan land in the same place, what is the model for? The answer is in the percentages. Finn can tell you Spain is the best team. The model can tell you that the best team at a World Cup still loses more often than it wins, and it can put a number on the gap.
Why Soccer Fights the Math
Soccer is close to the worst case for a prediction model, and the reasons are the same ones that make it thrilling to watch. Start with scoring. A basketball game produces well over a hundred scoring events, enough for skill to reliably separate from luck within a single game. A soccer game produces two or three goals, sometimes none. Goals are rare events, and rare events are inherently high variance, which means one deflection, one referee decision, or one moment of brilliance can decide a match that the better team controlled for 89 minutes. Finn put it plainly: a small country will park the bus, packing everyone behind the ball to defend deep for 90 minutes, score on one of its only two shots, and win 1-0. He is confident we will see it happen at this World Cup, and he has history on his side, because that is a nearly exact description of what Saudi Arabia did to Argentina. The models agree with him too, which is precisely why nobody is anywhere close to a sure thing.
Then there is the sample size. A club like Liverpool plays 38 Premier League games a season, often 60 or more across all competitions, enough games for luck to cancel out and true quality to show in the standings. A team that wins this World Cup will play eight matches, one more than before now that the expanded field adds another round to the knockout stages. Three of those are group games, and as Finn pointed out, you cannot afford a single bad day, because after the group stage it is win or go home. And unlike a club, a national team is a moving target: the roster turns over between tournaments and even between matches, so a model is never really rating the same team twice. Eight games is not a season. It is a long weekend, statistically speaking, and no model can squeeze certainty out of it.
And soccer offers almost nothing repeatable to learn from. American football is built from designed plays that can be run again and again, which is a gift to anyone modeling it. Soccer is 45 unbroken minutes at a time, and no two moments are alike. A red card in the 40th minute restructures the entire probability space of a match. So does an injured hamstring.
Why My Model Failed
I have spent 25 years building machine learning models, so I did not want to write about World Cup prediction without trying it myself. I trained a model on historical international match results to rate every team in the field. Then I backtested it, which means testing the model against tournaments that have already happened to see whether its probabilities hold up before trusting it in the future.
It failed the backtest, and its ratings told me why before the backtest did: it ranked Iraq above France. The model was not broken. It did exactly what I trained it to do, which readers of this blog will recognize as the way these systems often fail. Iraq plays most of its competitive matches in Asian qualifying against weaker opposition and wins a lot of them. France plays the strongest opponents in the world and drops points to good teams. A model trained on raw results, without accounting for who those results came against, learns that Iraq wins often and France sometimes does not. My training data was reliable, but it was missing the one feature that makes a result mean anything, which is the strength of the opponent.
I have watched this same failure outside of soccer more than once: you give a model a single number to maximize, and it improves that number in a way that defeats the real goal the number was supposed to measure. A model does not know what you meant for it to learn. It learns what the data rewards. Feed it wins without context, and it will faithfully reward winning in a weak league over losing narrowly in a strong one, and it will do so with total confidence, because confidence is not the same thing as being right. The Iraq rating was not the model misbehaving. It was the model holding up a mirror to a gap in what I gave it, which is the most useful thing a backtest can do.
When I asked Finn what factors he would distrust in a model, he named the FIFA rankings, because they reward teams for winning without asking who they beat. He diagnosed my model’s exact point of failure without ever seeing it. Every valid soccer model, Opta’s included, is built around opponent-adjusted ratings for precisely this reason, with recent matches weighted more heavily than old ones. Finn had independently arrived at that second principle too: he would throw out head-to-head history entirely, because not one player from the famous 2014 Brazil versus Germany match will be on the field this summer, and content accounts mostly use head-to-head records to make games feel more historical than they are. The math and the informed fan keep reaching the same conclusions. The math is the fan’s intuition with the wishful thinking removed.
What Even the Good Models Cannot See
Opponent adjustment fixes my model’s blindness. It does not fix the deeper one, which is that some of the most important variables in this tournament barely exist as data. No feature engineering can put a value on something that has never been measured, and this World Cup has at least three examples walking around in cleats.
Spain’s probability rests partly on Lamine Yamal, who is already regarded as one of the best players in the world and turns 19 six days before the final. Finn’s view is that Yamal moves Spain from a contender to a favorite, and his reasoning was the most data-literate thing anyone said to me all week: the kid has played only three seasons, so there is barely any history for a model to learn from. A model cannot correctly value a player it has hardly seen. And the variable got harder in April, when Yamal tore a hamstring that ended his club season. He has recovered faster than the early timelines feared and is back in Spain’s squad, cleared to play, but the coaching staff is managing his minutes carefully through the group stage rather than risking a relapse. So the model is being asked to price a teenager it barely has data on, who is now also working back from injury. No simulation knows how that resolves.
The inverse case is Guillermo Ochoa, Mexico’s 40-year-old goalkeeper, who plays his club soccer for AEL Limassol in Cyprus and was just named to a record sixth World Cup. His club data says he is a journeyman at the end of the road. His World Cup record says that every four years he becomes briefly brilliant. Models trained on club performance systematically undervalue him, and Mexican fans systematically do not. Somewhere between those two assessments is information no one has figured out how to put in a feature column.
Finn’s dark horse pick fits the same pattern from the team side. Ecuador conceded five goals in 18 qualifying matches, the best defensive record in South America, and spent a total of 97 minutes trailing across the entire campaign. A defense that airtight compresses every match toward the low-scoring coin flips described above, which is exactly the environment where an underdog’s probability quietly grows.
The Number Is the Reason to Watch
Post 1 ended with a question this post owes an answer to: if a model can predict the winner, does that make watching more or less interesting? Now we can answer it properly. The model cannot predict the winner, and it is honest enough to say so. What it predicts is the shape of the chaos: an 84% chance that the favorite falls, a field where one team cannot win in 25,000 tries and another wins exactly once and still get to play, and a tournament short enough that variance never has time to cancel out. Saudi Arabia beating Argentina on two shots was not the model failing. It was the model’s fine print coming true, and the fine print is the best part. The probabilities do not drain the drama from the next five weeks. They measure exactly how much drama is available, and the answer this year is: more than any World Cup in decades.
This post is part of our ML in the Wild series on AI at the 2026 World Cup. The first post covers the two ways AI is changing the tournament, and the next one looks at how clubs use machine learning to scout the players who end up here.
