It’s been a month since I have addressed you all on my blog — I’ve been writing quite a bit on the Patreon page, and separately for my school work as I wrap up the semester — and I have some fun prediction news for you today.
If you have not yet seen, Dr. Chris Warshaw (Assistant Professor of Political Science at George Washington University) and I are building a model to see how well data from a large survey, the Cooperative Congressional Election Study (CCES), can be combined with a statistical technique called Multi-level Regression and Poststratification to forecast House elections. What is MRP and what do we find? This post will offer a few brief thoughts.
First things first, what is this method? Simply put, Multi-level Regression and Poststratification (MRP) is a statistical method that uses survey data at one geographic level to predict other data at some smaller geographic area. The technique does so by first (1) estimating how much certain demographic (or otherwise) “covariates” can predict some factor, or variable, of interest and then (2) uses covariate data at the smaller geographic level to predict the same variable at that level. Here are two papers that use and explain MRP in a more applied, academic context. Here’s an excellent blog post by famous political scientist and statistician Dr. Andrew Gelman (at Columbia University)
In other words, MRP can take polls at the national level, match them up with demographic data at the state (or congressional district, in this case) level, and calculate estimates for those states (or districts). Imagine that we have a poll saying 60% of Americans support gay marriage, but we know that varies by state and want a solid prediction of what support for gay marriage looks like in those states.
Chris has written academic papers using this method and approached me about doing so for House elections. For now, the plan is to develop the method we’ve used to accurately, though retrospectively, forecast the 2016 House elections to forecast past cycles (the CCES has been releasing surveys since 2006) and create a model that tells us a lot about representation in the House of Representatives. The model we’ve made gets most 2016 competitive congressional district elections correct:
The byproduct, of course, is creating a method that could potentially be used to forecast the 2018 midterms to the House. We’re not nearly to that stage yet, but perhaps we soon will be. In the meantime, we can explore that that might look like.
It is perhaps foolhardy to say that we can use polling data from October 2016, adjust it for identifiable change in the electorate since President Donald Trump’s election, and then predict what might happen in November… but one can try. While this method is by no means an official, or even imaginable accurate, forecast of the 2018 House midterms, is does provide more information that is useful for consumers of election prognostication.
What I’m doing is simple:
- Take the 2016 Cooperative Congressional Election Study
- Adjust the raw data randomly so that the the congressional vote intention questions matches the current generic ballot polling average (Democrats +6.6%)
- Run the 2016 MRP model with the new data, making a prediction in every congressional district for the 2018 House midterms, had they been held today (as far as I know, that’s still schedule for November 6).
- Plug those estimates in to my normal simulation model for the contests (with adjusted error bars, as per the 2016 MRP projections).
I repeat these three steps, just once, to get a prediction of what the forecast movement in generic ballot polls (Democrats +6.6% to Democrats +9.6%) would say for a difference in predictions.
- 2018 midterms, if held today (Dem. +6.6%): 222 seats, 55% chance of winning.
- If Democrats are up 9.6%: 235 seats, 65% chance of winning.
These predictions are fun to make, sure, but they aren’t official for a variety of reasons. First, we’re still playing around with the method: the process by which we adjust old polling data to match today’s averages is ad-hoc and ignores a lot of the theoretical motivation implied by using MRP. Chief among them is the fact that people have predictable attitudes because of their race, education, age, gender etc. — the partisan breakdown of each has changed since the 2016 elections, so our results should change accordingly too.
Second, the method is being used for academic purposes right now. We intend to write some sort of paper about this, and as such, its not primarily being developed for forecasting purposes. That comes later.
Third, we don’t have raw polling data for the 2018 House generic congressional ballot question. If we did, we could apply this model to the same data and have a real congressional-district level snapshot of vote intention. We don’t, though — so we can’t make “real” projections for the 2018 midterms. However, there are some useful things we can say with these numbers.
First, what these “projections” can tell us is that the currently-estimated 6-7% national margin Democrats need to take back the House majority2 is about right. The MRP model makes a much more precise estimate, giving Democrats favored odds of a majority at a 6.4% national margin. This is perhaps the most useful insight provided by this approach.
Another (relatively minor) insight we can glean from Mr. P. is that the distributional differences in predicted Democratic vote margin hold across methods. In my actual 2018 forecast, there is a 15 seat difference between the median simulated Democratic outcome (224 seats) and the mean (AKA the average) simulated Democratic outcome (239 seats). That same difference holds up here, though is a smaller 5 seat difference (235 vs 240 in the D+9.6% model). This is of course the penultimate question of the analysis (the ultimate question being “who’s going to win?”).
Finally, a slightly more theoretical contribution for the forecasters reading (at least, for the foxes, rather than hedgehogs, among us): By adding to the pile of early 2018 indicators an additional method that relies upon a separate set of data we are able to boost the relative power of our predictions. Because one method is not dependent upon the other — or the other’s source of data — and they arrive at the same conclusion, we can put more stock into the estimates either produces. In other words, when methods agree, we ought to be more certain of their signals than when approaches diverge. Of course there is a balancing act to be played here: if we get tunnel vision looking at flawed or biased indicators, we are more likely to be misled. Some may argue that’s precisely what happened in “missing” the 2016 presidential election.3
What Chris and I have created here is the groundwork for another high-quality projection of the 2018 House midterm elections — but we don’t have on quite yet. Keep on the lookout for that to be developed over the next few months. In the meantime, my main method of forecasting outcomes in November will be updated 4x daily at the usual link. As of now, Democrats are ahead in the race to capture the House of Representatives majority. Will their lead hold?
- If Chris and I do end up deploying this method for 2018, rest assured, you will find a whole suite of graphics online. ↩
- This is a fuzzy estimate, of course; Democrats could win with a small 4% national margin or lose with a large 10% one. My current forecast model says that either of these scenarios happen about 35% of the time the corresponding vote margin occurs. ↩
- Of course, I gave Trump a 15% shot at winning the presidency. One-in-seven events happen all the time (roughly one out of seven times, in fact). ↩
- Democrats Scored a Big, Unexpected Win in the May 8 Primaries
- 2018 Expectations Roughly Stable Amid Democratic Polling Slide
- House 2018 Model Talk: Regression versus Simulation