This is part of a series of short posts about politics that seeks to show how we use data science to learn more about the real world. Follow along here.
Since Elizabeth Warren’s announcement on New Years Eve, it definitely seems like the 2020 presidential campaign — or at least the period of money-moving and strategy-making that political scientists call the “invisible primary” — is rapidly ratcheting up. I, for one, am happy about this, partially because I’m a nutcase who likes the perpetual campaign and also because it unlocks a treasure trove of new opinion polling.
While we won’t learn much about vote intention from these polls this early — as I joked on Twitter, you’re better off burning the topline than reading it — there is some amount of useful data contained within the favorability toplines and crosstabs.
PSA: What you should do with 2020 Democratic nomination polls taken any time before September 2019:— G. Elliott Morris (@gelliottmorris) December 14, 2018
1) Burn then
2) Burn them again
In this post, we’re going to analyze a poll of likely 2020 Democratic voters in Iowa from CNN. Per usual, you can find the data and code for this post on my GitHub.
The data are straightforward. I simply transcribed the CNN name recognition numbers directly from the polling PDF and calculated a measure called “net favorability” equal to a candidate’s favorability minus their net favorability, that way we’re looking at their relative popularity (the favorability numbers are obviously correlated with name recognition, whereas I think it’s less clear that net favorability would be).
poll <- readr::read_csv("../../data_no_export/post/2019_01_11_cnn_poll_favs/2019_01_11_cnn_poll.csv")
Here is the table of name recognition and net favorability for each candidate:
library(knitr) library(kableExtra) # make a table poll %>% select("Candidate"=candidate, "Name Recognition" = name.rec, "Net Favorability" = fav.min.unfav) %>% kable(caption = "2020 Democratic Candidates' Name Rec. and Favorability") %>% kable_styling(bootstrap_options = "hover") %>% footnote("Source: CNN/SSRS poll of Iowa Democratic caucusgoers")
|Candidate||Name Recognition||Net Favorability|
|Source: CNN/SSRS poll of Iowa Democratic caucusgoers|
From looking at the table, it looks like our hypothesis is true; the candidates at the top of the table, the better known ones, seem to have higher net favorability ratings than the candidates at the bottom, the lesser known ones. The data also look pretty linear. That relationship is perhaps easier to visualize than to digest via staring at a table, so let’s do just that:
# make a plot of name recognition (x) and favorability (y) gg <- ggplot(poll,aes(x=name.rec,y=fav.min.unfav,label=candidate)) + geom_point(col='blue',size=2,alpha=0.8) + geom_label_repel() + geom_smooth(method='lm',se=F,linetype=2,col='orange') + labs(title="Early On, Name Recognition is Everything", subtitle="In polls of the 2020 Democratic nomination, better known candidates have higher favorability ratings", x="Percent with an Opinion about the Candidate", y="Net Favorability", caption="Source: CNN/SSRS poll of likely 2020 Iowa caucusgoers") preview(gg)
Tada! Our suspicion is confirmed. As name recognition increases, so does net favorability. This provides some ground to say that there’s an upside for lesser known candidates like Beto O’Rourke, Amy Klobuchar, or Kamala Harris once they actually jump in the race; as their name recognition goes up, so should their popularity.
This could also be bad for candidates who are already well known. This early, they have already shown voters who they are. If you believe that this image has been mostly popular, and more bad things than good things will be revealed for the candidate over the course of the campaign — maybe they haven’t been properly scrutinized before and there is some damaging info just waiting to be leaked — then they cannot benefit from simply telling voters more about them. The CNN analyst Harry Enten touched on this and provided some similar data to mine, but we don’t really have anything to go on here but conjecture. We’ll just have to find out!
As the race heats up, your best bet is that lesser known candidates will become more popular as people hear about them, at least as measured by their favorability ratings (not vote share), all else being equal.
I hope you learned something! Again, check out the rest of this series of short posts about using data science for political analysis and let me know what you think below. If you want more, just come back next week!
- R for Political Data Science Week 7: The 2020 Twitter Primary
- R for Political Data Science Week 6: Just How Liberal Are the 2020 Democratic Candidates?
- R for Political Data Science Week 5: The Ideological Diversity of the American Electorate