A Primer on Polling Error in 2016 — A Historical and Comparative Perspective

Polling error in the 2016 US election may not have been as bad as you think. Where does it stand with history? With other democracies?

By George Elliott Morris and Alexander Agadjanian / January 03, 2017

 in 2016 Election US Politics UK Politics



I wrote this analysis with my friend Alexander Agadjanian. Follow him on Twitter and check out his blog!

Polling error in the 2016 US election may not have been as bad as you think. Where does it stand with history? With other democracies?

Primary Findings

Hillary Clinton bested Donald Trump in the 2016 election popular vote by a margin of 2.1 percentage points. By our best estimates, public opinion polls overestimated Clinton’s win margin by a mere single percentage point — not an unimpressive performance by pre-election polls at the national level.

Historically, pre-election polls are relatively good indicators of election outcomes. Our research shows that they have an average error of around 2.5 percentage points dating back to 1980. Using the error between polled and actual win/loss margins of Democratic candidates, national polling in 2016 will go down in history as the fourth most accurate in the last 10 US presidential elections. For an industry that is said to be getting worse and worse at reading the American electorate, these findings are revealing and significant. That being said, there remain areas of serious concern.

Yet this simple comparison of final polling accuracy is not substantial evidence that US polls are are any good at all. Could it be that polls in other western democracies are of higher quality than those in the US? Do voters in the UK, for example, have a better sense of the course of their elections months in advance?

In that vein of thought, we incorporate analysis of public opinion polling in the United Kingdom as a point of comparison. Our analysis of polls in the United States and United Kingdom finds that polls in the United States are more accurate on election day than those in the UK. In addition, they seem to have greater variance around observed campaign inflection points such as the primary season, party conventions, and presidential debates. This suggests that US polls are better at reading the dynamics of the electorate. Alternatively, the greater volatility in US polling—as documented by others —could signify more erratic polling measurements and not so much meaningful movement. This question, however, is not central to our inquiry.

Figure 1: raw Dem/Lab margin averaged over all years, with three overlayed trend lines (US, UK, US 2016)

Our findings that US polls are more accurate in the last week of the election holds true for almost every day of the 2016 campaign cycle. Only during the heat of the 2016 primary elections and the presidential debates did the level of polling error in 2016 come close to rivaling that of the average across all US and UK elections.

More specifically, the last week’s worth of polling in 2016 had an average error of 1.07 percent, which is roughly a point and a half lower than the average of all final week polling in US elections since 1980 and was much better than the 2015 UK election, which had roughly seven percentage points of raw polling error.

Overall, polling has gotten much more accurate over the years. We also have more accurate poll results earlier in the cycle. In 2000, for example, polls taken 300 days out had error upwards of 20 percentage points. In 2016, error 300 days out was a comparatively miniscule four points off. As if that weren’t enough, we find that the 2016 election continues the trend of increasing polling accuracy relative to the UK.

United States Historical Polling Accuracy

Beginning in 1980, there was huge error in pre-election public opinion polls. Even worse, volatile spikes in error created far more noise than there was signal. At one point, you would have been right to expect Jimmy Carter to beat Ronald Reagan by an enormous forty percentage points! Only beginning once July came around did pollsters pick up on Reagan’s huge support among the silent majority — and even then, he was not favored by nine points. 1992 also saw larger than average error, but what is the average and what does it tell us?

Year Error in Last 7 Days of Election Cycle Year Error in Last 7 Days of Election Cycle
1980 7.56 % 2000 4.31
1984 1.51 2004 1.44
1988 0.36 2008 0.58
1992 0.86 2012 2.98
1996 3.22 2016 1.07
Average 2.39 %

According to our analysis measuring the error in each day’s poll of the Democratic candidate’s win margin, the average polling error in the last week of United States election campaigns is 2.49 percentage points. Empirically, this error has still allowed for accurate predictions of elections — but error before an election can be pretty high, as seen in Figure 2a, which displays the error one year out from Election Day for every cycle since 1980.

Figure 2a: average US dem margin graphic faceted by year (all)

Take 1980, for example. Error of the final week’s polls was 7.6 percent. However, error 300 days before the election (in early February) reached upwards of 40 points. This is mostly due to a few polls that had Democrat Jimmy Carter beating Ronald Reagan by 30-35 points — shockingly inaccurate, as Reagan beat the Georgia peanut farmer by nine percentage points. This early error has greatly decreased in recent years. The maximum error in the entire 2016 election cycle was just 15 percentage points.

The only other time the error reached as large that in the first few months of the 1980 cycle was in 1992. One year before the 1992 election, polls had George H.W. Bush beating Bill Clinton by more than 30 points, though Clinton emerged with a six point victory on election night. These two cases of extreme error–the early periods in 1980 and 1992–distort the progression of error observed in the small multiple graphs for other years. In order to get a closer look at the polling error in some of these other more recent years, Figure 2b below looks only at error in elections starting in 1996 through the present day.

Figure 2b: average US dem margin graphic faceted by year 1996-2016

Over these last 20 years, error in the Democrat’s margin has a decreasing trend in only three of six elections over the course of the campaign. This is a bit surprising, as we would suspect that error generally decreases closer to Election Day across most cases. In 1996, the error slightly increases as Election Day neared, though the error dropped in the final two months or so. The final error ends up at 3.2 points. Notably, in 2012 and 2016, the error stays fairly stable across the campaign season. At only one small period (in 2016) does the error ever extend beyond five percentage points. However, the final error in national level polls was 1.1 in 2016–lower than the three point error in 2012, despite more controversy over polling in 2016.

Finally, to measure error in an alternative way, we use average error in Democratic and Republican vote shares since 1980. This could potentially get at polls over- or underestimating one party more than the other in modern history. For example, in 2016, it is well understood that polls underestimated Republican candidate Donald Trump’s support at the state level. In the below plot we try to see if something similar appears for national level, averaged across all elections since 1980:

Figure 3: average Dem/Rep raw share overlayed

As seen in Figure 3, there is not much difference between the polling error in measuring Democratic support and the error in measuring Republican support a year out from Election Day. On average, the error in Democratic vote starts off being much larger, but after around the 300 days-out mark, error in gauging Republican vote proves slightly larger at nearly every juncture in the campaign. Within the last week of the campaign, the error in vote shares for the two major parties becomes almost indistinguishable.

United Kingdom: Historical Polling Accuracy

Similar to the approach for examining US polling above, we use the margin of victory of the Labour Party over the Conservative Party to gauge accuracy of pre-election polls in the United Kingdom. Our data for comparing UK polls and actual UK election outcomes stretches back to 1979, encompassing the last nine general elections.

Year Error in Last 7 Days of Election Cycle Year Error in Last 7 Days of Election Cycle
1979 4.33 % 2001 6.02
1983 4.34 2005 2.02
1987 1.41 2010 0.13
1992 10.05 2015 5.6
1997 5.61
Average 4.4 %

Relative to our results for US polling, errors in UK pre-election polling are much larger for polls conducted within a week of an election. Following the first US election (1980) in our dataset through 2016, the final percentage point error in Democratic margin never exceeded 4.31 points. Since the earliest UK election we examined, the error in Labour Party margin was greater in five different elections.

Figure 5: average UK Lab margin graphic faceted by year

While not reaching the levels of early polling error for 1980 and 1992 elections in the US, the UK polls/elections comparison yielded the year with the highest final election week error from both countries: the 1992 UK general election, in which the error was 10.05 points. Several polls within a week of the election had Labour candidate Neil Kinnock leading Conservative candidate John Major by a few percentage points–Major won by a 7.8 point margin.

Notably, as seen in Figure 5 and in the previous table, the final week error doesn’t seem to improve in more recent UK elections, a development that stands in contrast to that in the US. The final week error–supposedly the most accurate read of an electorate about to cast its vote–falls to 1.41 percent in the 1987 election, but spikes back up to 10.05 a cycle later. After declining, it increases once again to 6.72 points off the mark in 2015. National polling error in the US certainly doesn’t uniformly decline in more recent elections, but still remains lower and much less volatile than error in UK popular vote polling. Year-to-year volatility in final week error is thus greater for UK polling, while within-year volatility throughout a cycle is greater for US polling.

Relative to US polling, we did not find that UK polls were that much more error-prone in gauging one major party’s level of support than that of another. Given the popularized notion of the “Shy Tory Effect,” there may be more reason to suspect a party-specific error regarding Conservative candidates in the case of UK elections. While we can’t directly speak to how much polls underestimated Conservative support (because we’re using the absolute value of polling errors), we can still gauge whether polls were more off–in whatever direction–for one party than another.

Figure 6: average Lab/Con raw share overlayed

As the above figure illustrates, it’s been Labour Party support that has been more consistently misestimated. At every point in the year on average before the election date, the error in Labour Party vote share has been greater than the error in Conservative Party vote share. While not drastic, the difference is a few percentage points and is constant across the course of the election cycle period. This does not prove polls over- or underestimate Labour support, but simply that they are more error-prone in estimating the Labour vote.

2016: A Deeper Dive

Recall from earlier that final week national polling error in 2016 was the fourth lowest in United States history. Indeed, the one percent overestimate of Hillary Clinton’s vote share is not a comparatively large error. Democrat Jimmy Carter was overestimated by seven percent in 1980, and the average of all election’s error more than double this year’s at 2.39 percent. Even better, this year’s polling accuracy extends to earlier in the campaign as well, as polling error in 2016 was lower than the average error for all but 10-15 days over the campaign cycle. Although a look at 2012 error shows large swings in error before election day, final polling error is significantly lower than that of the last election cycle.

Figure 4: poll error in 2016 vs 2012 & the average US

The largest difference in error between 2016 polling and the average across 1980 to 2016 developed earlier during the campaign cycle. Perhaps this indicates that candidate preferences mature over time, or maybe that higher quality polls are released later in the election cycle. Even still, it could be that the higher volume of polls later in the election cycle correct each other’s error. We’re not perfectly sure, but it’s probably a combination of all three.

National Polls Were Fine. Why Did Trump Win Then?

However, our keen focus on national polling error fails to help us understand failed predictions of 2016 — that burden falls on state level polling. The latter proved more consequential for understanding the outcome of the election, which held greater levels of error for many non-competitive states but also for crucial battleground ones. Of course, it was the states that decided the electoral college breakdown, not the national popular vote. The below figure succinctly captures state level polling error:

Figure 9: state level error

The graph plots the polling margin lead for Clinton — based on polls conducted entirely in the final week of the campaign — against Clinton’s actual margin of victory in each state. Each dot represents a state. If the point falls below the dashed 45-degree line, Clinton’s margin was overestimated in the state; if it is above the line, her margin was overestimated. In 12 of the 49 states that appear in the graph, Clinton’s margin was underestimated; in 37 states it was overestimated. This is to say that in 37 states Trump overperformed relative to his polling average in that state. Wyoming is the one state excluded here because it did not contain a poll in the final week of the campaign. If we extend the scope of polls covered to two weeks before Election Day, Wyoming would represent another state in which Clinton’s lead was overestimated (making it 38 of 50 in which this occurred).

In the 12 states where Clinton’s margin was underestimated, the polling margin was off by an average of 2.81 points. In the 37 states where Clinton’s margin was overestimated, the polling margin was off by an average of 6.28 points. There are 11 states we would consider battleground territory where Clinton or Trump’s margin of victory was less than five percentage points. On average, Clinton’s lead was overestimated by 3.01 points in these states.

It’s worth noting that the error in several of the key battleground states was not extraordinary, but in many cases, just barely large enough to swing those states to Trump. Moderate yet considerable errors of 2.2 points in Florida, 2.7 in Pennsylvania, 3.7 in Michigan, 5.2 in North Carolina, and 5.8 in Wisconsin had an outsized impact in shifting the election outcome away from what most people expected.

Implicit in our analysis is the motivating idea that national-level polling — and the popular vote it estimates — is meaningful for predicting an election outcome. Usually, that passes as a valid assumption. However, 2016 is yet another reminder that the electoral college determines the president — and in five times in our history the electoral college has not sided with the majority of voters. In elections from 1856 to 2016, the popular vote share of the winner explains just 56 percent of the variation in their share of electoral votes (both adjusted to be two-party shares). Below we illustrate this close pattern and how much the 2016 election outcome deviates from it.

Figure 8: Pop vote v electoral vote % split

Only five elections have resulted in a popular vote-electoral college split in history, with two of these instances coming in the last 16 years. Excluding the 1876 election — which did not count popular vote totals for Colorado but did count its electoral votes. Donald Trump proved to be the election winner with the smallest share of the popular vote in history.

It’s clear that the 2016 election entailed one of if not the biggest asymmetries between electoral college and popular vote shares in history. Given that the central unit of analysis (national polling) spoke to popular vote estimates, this calls into question whether polling at the national level is informative enough on its own for predicting elections. It’s always important to bear in mind that 2016 is just one observation among many. At the same time, two of the nation’s five electoral college-popular vote splits occurred only in the last 16 years, and this latest one is the largest split (measured by raw votes) in history.

In part due to the uneven geographic distribution of key constituencies of the Democratic base, gauging the national popular vote by itself is no longer enough to properly assess the state of polling in the United States. A technical dive into the polling error of 2016 will have to wait for another time, of course, and other academics/pollsters have much better handles on this than we do. So, too, will a historical state level polling analysis. Thus, while national polling turned out much more accurate than initially believed to be, the result falls within a backdrop of systematic state-level error that could not be cured by aggregation. Just the right amount of error in just the right amount of states swung this election — a very low-probability outcome in retrospect. However, that should not distract from the systematic error and serious problems with polling across several states.

Discussion

We have made much of the fact that national polling accuracy in the United States is both good and improving, with 2016 being a continuation in this trend, albeit with a slight perturbation. What we have not discussed is why dynamics exist, and what caused them in 2016. Future research into this could focus heavily on the role of the media in influencing this past election, as well as the effect of debates and conventions. Even events unique to 2016 may have a measureable impact — perhaps leaked information by WikiLeaks, the Clinton email scandals, or FBI Director Comey’s October-surprise would surmise.

Still, it remains true that pollsters did well on the national level this year. Maybe they even did well enough to quell the storm polling backlash that has arisen from public opinion misfires such as the 2015 UK election, the 2016 UK EU referendum, or the 2016 Colombia-FARC peace deal referendum. We may be a bit optimistic here, however, considering our own analysis of state polling error suggested big flaws in measures of provincial public preference. Our argument is simply that polls did a pretty good job in 2016, but people have reason to call out the industry on its smaller, yet significantly more decisive, errors.

Even still, our analysis may be conflating polling quality and electorate preference stability. When we think of possible alternative explanations for why error across the one year out period has decreased in more recent elections — apart from better polls — the growth and strengthening of partisanship provides a compelling explanation. Unlike in the past (e.g. 1980 with crazy error), ~80% of the electorate’s preferences are already set in stone. So error stability over time in (and how this has decreased relative to 1980, 92, etc.) may have more to do with this than polling quality.

These questions require an even deeper historical and theory-based dive into pre-election polling error. While our analysis does suggest that polls have gotten much better over the years, we leave open the possibility that polling accuracy itself has not evolved much, but rather, electoral preferences have become so stable that polls much more easily pick up on these trends.

Of course, polls are better now than in the past, and they were better still in 2016. With national error of a mere one percent, maybe pollsters should turn their attention to their massive deficit in accuracy at the state level. Then, we would really know what’s what in electoral preferences.



Thanks all for reading today. Keep reading below for a guide to how we performed the analysis and to whom we owe some additional credit. If you’re not interested in the technical mumbo-jumbo, go ahead and scroll to the bottom of the page to sign up for our newsletter and like us on social media!



Research Design

Data sources

Polling data for U.S. elections from 1980 to 2012 and for UK elections from 1979 to 2010 were provided by Harry Enten of FiveThirtyEight — it is from him who we also got the original idea for this analysis. We collected polling data for the 2016 U.S. election through the Huffington Post Pollster API and turned to UK Polling Report for polling on the 2015 UK election. General election results for the two major U.S. and UK parties came from David Leip’s U.S. Election Atlas and UK Political Info, respectively.

Measuring Error

Our main interest in this analysis is the dynamics of error in United States presidential elections–especially in 2016. To that end, error is measured as the difference between each election’s final Democratic win/loss margin and the polled Democratic margin on each day in that election cycle. In most elections we have polling data from nearly 350 days out, although recent years have data up to 550 days before election data.

We also wanted to capture party-specific error trends in voter preferences. For example, instead of plotting daily error in the Democrat’s win margin and their poll margin, we measure daily error in both party’s raw vote share. This yields itself well to observing whether or not one political party — or candidate — has more error in polling numbers than others.

Measuring Polls

For all parts of this analysis, we use the last day in a poll’s field dates to represent the day at which poll is conducted. This is not technically correct, as polls are conducted over multiple days, but we assign polling data to one particular day as a simple and justifiable proxy for field dates. Of course, it’s possible that we could split polls over all their field days — effectively creating new polls for each day during which a poll collects responses — but our analysis doesn’t necessitate that approach, although it has been taken by others. We convert the polls’ end dates to a measure of their distance from election day. This lets us compare each election’s week before election day as ED-7, even though they might occur on different days in the cycle.

Daily error is thus the difference between each day’s polled democratic win margin and the actual democratic win margin. In computing daily error, we found that both recent and past election cycles conduct more polls as election day approaches, even though polling has become more common in the twenty-first century. On some days, no poll is released — but we still want to compute the error data for that day in the election cycle. To fill in the dates where we have no polling data, we use linear interpolation to approximate public opinion. In other words, if we have polling data for days A and C but not for day B, we would find the middle vote shares between those of A and C to set as the value of poll B. This also works when we have gaps in polling data longer than two days.

We emphasize error in margins rather than error in vote share estimates as margin holds the most value polls confer: the difference in support between the two major candidates is what determines the election, not just the raw numbers at which they poll.


… . .

Thanks for reading today everyone. Tune in to my Twitter for data updates and make sure you sign up for my newsletter to get notifications of recent posts.



Share


Tags


Related


Comments

comments powered by Disqus