Stay up to speed with the forecasts here, as things are bound to change.
It’s that time again for international football fans: World Cup time. If you follow me on Twitter then you know that I’ve got a hot little 2018 FIFA World Cup forecasting model up and running. “Just a fun little weekend project,” I said to myself the Friday the group stage began… now it’s monday evening, and here I sit, explaining it to you…
The purpose of this post is to show you how I made the model and point you toward an estimate of the most likely winner of the tournament (probability above generated before any matches had been played).
This method relies partially on the excellent “worldcup” package written by Ava Yang, which itself is an adaption of a football prediction model developed by Claus Thorn Ekstrøm. Though my process makes several adjustments both in approach and technical details, their work is great and well worth checking out!
To forecast the FIFA World Cup — a five stage tournament made up of a 32-player group stage, a knockout round with the top 16 players, quarter finals, semi finals, and a final round — you have to follow a few basic steps. First, you need a model that can predict the outcome of any game given a measurement of how good both teams are. Then, you need to be able to predict each game in all five stages based on who qualifies for that round. In between each stage, you also need to account for changes in the measurement of how good that team is. Here are the four steps I took to making a model for the 2018 FIFA World Cup.
- Analyze the relationship between team ratings and goals scored
- Teams are rated according to their elo ratings, a method for ranking teams adopted from chess
- Produce a predictive model to forecast the goals scored by both teams in every game in the World Cup, given their elo rating
- I use two independent poisson point processes to predict the outcome of a match between any two players
- Simulate the outcome of the World Cup match by match, stage by stage until the final round
- … and after each round played (with the group stage split up into three rounds), updating the teams’ elo rating
- Repeat this simulation ten thousand times
- The chance that any given team wins the World Cup (or make it to x stage in the tournament) is simply the number of trials in which it makes it that far divided by 10,000.
If you’re interested in more of the coding and modeling details, check out my repository on GitHub.
As of now, Germany has a decent lead in the 2018 fight for the World Cup. Stay up to speed with the forecasts here, as things are bound to change. I update the project page every day after all matches have been played.
- R for Political Data Science Week 3: How Marginal Tax Rates Work
- R for Political Data Science Week 2: This Early Before 2020, It's All About Name Recognition
- R for Political Data Science Week 1: Polarization in the 115th Congress