The upcoming election in the Netherlands is going to be, what my uncle would call, a doosie. On March 15th the country will decide which party deserves control of its parliament. Will it be the incumbent People’s Party for Freedom and Democracy, headed by Mark Rutte? Or will the winner be the leader of the Party for Freedom, Geert Wilders, who has been called a “Dutch Trump?” Maybe it will be someone else entirely? All of these are within the realm of possibility. Maybe even probability.
Now, we’re ready to figure out how the forecast works:
The process for forecasting the Dutch election is much, much simpler than that of France or the United States. That’s because the Netherlands doesn’t have an electoral college or a two-round electoral system. Instead, the Netherlands simply converts the percentage of vote received by the population to the number of seats each party gets in its parliament. The process for forecasting those received votes, in brief, is as follows:
Before we get started, I want to remind you of a few fundamental truths of the forecast model.
There is always some chance that an outcome can happen. Sometime that chance is zero, and sometimes it is one-hundred — but often times, the outcomes we feel most certain about have just an eighty percent chance of happening.
Being told that a given presidency is only fifty percent likely is not very informative at face value. Instead of just spouting off numbers, analysts should utilize forecasts to explain what we can expect in certain events, often based on what (properly calibrated) forecasts have said in the past.
The forecast can also be useful in gauging the impact of certain events, or the wiggle-room candidates have when making important political choices. There are just a couple of the scenarios in which the probabilities of the forecast can be helpful.
It only makes sense for us to take recent data more seriously than old — events can render old data obsolete — but it is often the case the political environments change without big events happening to spur that shift. Not only does this make sense, but doing so has helped our model make better predictions in the past.
This also causes the model to look rather volatile for some races. However, we would rather have this volatility than a model than treats information that is one week old the same as information coming out on the day before election day.
The oft-repeated adage that “the past is not indicative of the future” may be right for some things: short term poker odds and votes on low-level congressional legislation, maybe. But in forecasting there is one thing the past conveys very well: error. We can use the average of past polling error as an indicator of the error, or uncertainty that we might see this year.
Finally, any good modeler will give you the following advice: “every model is wrong, but some are useful.”
The polling for Dutch elections comes in an odd form, and so we have to change some stuff before we begin. For starters, election polling in the Netherlands reflect the amount of seats each party is expected to get, not the share of the vote they will get. This is not a problem, as all we have to do is change the number of seats back to a proportion of the total seats expected, which is the same as the proportion of the total vote. After collecting the polls (all data is entered by hand to ensure integrity) it is time to compute our average of the polls.
However, not all polls polls are created equal. So instead of taking a regular average of all polling, we compute a weighted average of the polls, similar to the method in which your grade school math teacher would measure your final grade.
The weights are assigned based on just one factor: the recency of a poll. This way, our model thinks that polls that read the opinions of people more recently are better representative of the current electoral climate (because they are). Does it make sense that a week-old poll with 400 respondents should count the same as a one-day-old poll with 1,200 respondents? Of course not. We try to fix this problem.
After all the weights are assigned, we get the snapshot average of polls for today. We keep that list of polling averages for the rest of the day’s model.
Simulating the election, in comparison to gathering polls, is much more complex (bear with me). In this step I detail how we use the past historical error of polls to apply uncertainty for this year’s election.
Here I am going to explain how we would do one single simulation of the election. Keep in mind that we repeat this several thousand times (50,000!) so no single trial has much weight in the forecast. That’s the idea, anyway, behind the law of large numbers — do so many trials/pick so many numbers that outliers won’t have any effect.
is to randomly vary each party’s vote share to get an idea of the range of possibilities in the election. We do this with the dirichlet distribution. We estimate the range of possibilities for each party using historical polling error data gathered by professors of political science Will Jennings and Christopher Wlezien. In the past, polls of the 3% on election day. This error is larger earlier in the campaign, and looks like this:
For example, this first step of trial may look like this (NOTE for simplicity the numbers below are made up):
Before random variation:
|Trial||Other Vote (%)||VVD||PVV||CDA||D66|
and with one trial random variation:
|Trial||Other Vote (%)||VVD||PVV||CDA||D66|
Then, simply, we assign a winner to this trial of the simulation:
This is perhaps the easiest step, as we’re just counting up the number of simulations in which each party wins. Then, we divide that number of wins by the total number of simulations (50,000) to get our final win probabilities. They look like this!
That’ll do it for today. Questions? Comments? Concerns? (Hopefully not concerns). Send me a tweet.
Keep your eye on the forecast for updates and new features.