APSA 2018 — Forecasting House Midterm Elections

By G. Elliott Morris / August 31, 2018

 in 2018 Midterms

I am sadly unable to make it to the annual meeting of the American Political Science Association this year. I have included below some notes on what I was preparing to say at the meeting before plans had to change last minutes. Of course, if you’re reading this on my blog, you might as well just head on over to the forecasting page anyway!

A Polls-Plus-Fundamentals Forecasting Model for the 2018 House Elections

In absence of APSA, a memo on my technique to forecast the race for the House

For more information: email

The race for control of the House is playing out in roughly 70 districts rated as possibly competitive or worse for the incumbent party. As of September 1, 2018, the model (described below) suggests that Democrats are favored to win the majority of seats in the House, winning the popular vote by 8.2 (+/- 6) percentage points and picking up a net 39 seats in the median simulated election with a 95% credible interval between 197 and 286 seats. The distribution of possible outcomes points to a 78% chance that democrats win control of the chamber in November. The chance that Republicans hold onto the chamber exists almost exclusively in simulated outcomes where they also lose the national popular vote.

The modeling process works as follows: similar to prior work that distills national information to the district level (Bafumi, Erikson, and Wlezien, 2014), I forecast seat outcomes for all 435 seats in the US House of Representatives by first (1) predicting the national popular vote for the House, (2) predicting the outcome in every seat, and (3) simulating 50,000 possible outcomes of the House elections.

National popular vote

The first step in the model, predicting the national popular vote, does so with the equation White House party national popular vote ~ predicted November white house party margin in generic ballot polls + white house party average change in vote margin for special elections since the previous cycle. Variables for each year are calculated as followed:

With time-series data available for both variables, the predictive margin of error for the national vote forecast is different for every day in the election cycle. The leave-on-out root-mean-square error for this equation is roughly 3 percentage points at Labor Day and approaches 1.5 percentage points in the final days of the campaign.

District-level information

The process for forecasting the election in each congressional district happens in two steps: first, a “fundamentals-based” forecast for every seat is generated with the following linear model using L1 and L2 elastic net regression with coefficients for predicting the 2016 midterms using data from the 2012 and 2014 election cycles:

*In seats that were previously uncontested, the forecast model imputes a prediction of what the Democratic vote margin would have been.

We thus have a point estimate for every district, with a margin of error equal to 1.96 times the predictive root-mean-square error of the forecast when back-tested on the 2016 elections. This RMSE is roughly 7 points for incumbent-held seats and 9 points for open seats.

Second, the fundamentals forecast is updated with an average of available district-level polling. The polls are averaged with an exponentially-decaying weight determined by a program identical to the one used for averaging national polls (unsurprisingly, though, the term is much lower for district-level polls). The average in each seat is then adjusted for the forecast trend in the national environment between day Z and election day. The two indicators are then combined together using Bayes formula: the fundamentals forecast serving as a conceptional informative prior and the polling average serving as conceptual observational data. The two indicators are combined according to their predictive precision, equal to the inverse of the (1) the variance transformation for the fundamentals forecast — 7(9)^2 — and (2) variance of the averages margin of error at day Z for every day in the campaign. The resulting indicator has a predictive margin of error equal to the square root of the inverse of the sum of the precision.

For districts in Washington and California, an intermediate Bayesian updating is performed before the poll updating to inform the model about the change in the White House party’s share of ballots cast in the seats top-two first round of the general election.

Simulating the election

The final step in the process is to account for the uncertainty in the point-estimates. This is done using a stochastic simulator employing the Monte Carlo method that randomly generates national environment for each of 50,000 draws from a gaussian distribution with mean equal to the projected white house margin in the national popular vote and standard deviation equal to the predictive margin of error on that projection for day Z in the campaign.

In each of the 50,000 simulations, each congressional district also receives error varied according to a draw from the gaussian distribution with mean equal to the posterior mu of the Bayesian- updated White House party margin in the district and sd equal to the sigma of the resulting posterior distribution (a conjugate normal, such that mu and sigma are analogous to the mean and standard deviation of the prior “fundamentals” forecast). The error is correlated between districts to adjust for the chance that combined district-level modeling and polling and errors are correlated. Across time, district results are correlated with a Pearson’s correlation coefficient approaching 0.24 — a much more modest correlation than the one, f.e., between states in a presidential election. Each simulation generates one value for the number of seats that Democrats win and one value for the national popular vote of that simulation.

The resulting 50,000 estimates are aggregated together to produce the ultimate probabilistic forecast for the model: on the day before this presentation (September 1) equal to 78%. Refer back to page 1 for the model’s aggregate findings or contact the author for more.




comments powered by Disqus