R for Political Data Science Week 2: This Early Before 2020, It’s All About Name Recognition

Ahead of the 2020 Democratic primary, there’s a very clear relationship between being better known and better liked.

Jan. 11, 2019

Categories: R for Political Data US Politics R-Posts Tags: 2020 Democratic Polls Politics R Tidyverse

This is part of a series of short posts about politics that seeks to show how we use data science to learn more about the real world. Follow along here.

Since Elizabeth Warren’s announcement on New Years Eve, it definitely seems like the 2020 presidential campaign — or at least the period of money-moving and strategy-making that political scientists call the “invisible primary” — is rapidly ratcheting up. I, for one, am happy about this, partially because I’m a nutcase who likes the perpetual campaign and also because it unlocks a treasure trove of new opinion polling.

While we won’t learn much about vote intention from these polls this early — as I joked on Twitter, you’re better off burning the topline than reading it — there is some amount of useful data contained within the favorability toplines and crosstabs.

PSA: What you should do with 2020 Democratic nomination polls taken any time before September 2019:

1) Burn then
2) Burn them again

— G. Elliott Morris (@gelliottmorris) December 14, 2018

In this post, we’re going to analyze a poll of likely 2020 Democratic voters in Iowa from CNN. Per usual, you can find the data and code for this post on my GitHub.

The data are straightforward. I simply transcribed the CNN name recognition numbers directly from the polling PDF and calculated a measure called “net favorability” equal to a candidate’s favorability minus their net favorability, that way we’re looking at their relative popularity (the favorability numbers are obviously correlated with name recognition, whereas I think it’s less clear that net favorability would be).

poll <- readr::read_csv("../../data_no_export/post/2019_01_11_cnn_poll_favs/2019_01_11_cnn_poll.csv")

Here is the table of name recognition and net favorability for each candidate:


# make a table
poll %>% 
         "Name Recognition" = name.rec,
         "Net Favorability" = fav.min.unfav) %>%
  kable(caption = "2020 Democratic Candidates' Name Rec. and Favorability") %>%
  kable_styling(bootstrap_options =  "hover") %>%
  footnote("Source: CNN/SSRS poll of Iowa Democratic caucusgoers")
Table 1: 2020 Democratic Candidates’ Name Rec. and Favorability
Candidate Name Recognition Net Favorability
biden 96 67
sanders 97 52
warren 84 44
o’rourke 64 42
booker 61 37
harris 59 39
holder 58 26
bloomberg 71 26
klobuchar 46 30
gillibrand 45 30
castro 37 17
delaney 36 14
hickenloopr 33 15
brown 31 15
swalwell 30 10
inslee 18 3
Source: CNN/SSRS poll of Iowa Democratic caucusgoers

From looking at the table, it looks like our hypothesis is true; the candidates at the top of the table, the better known ones, seem to have higher net favorability ratings than the candidates at the bottom, the lesser known ones. The data also look pretty linear. That relationship is perhaps easier to visualize than to digest via staring at a table, so let’s do just that:

# make a plot of name recognition (x) and favorability (y)
gg <- ggplot(poll,aes(x=name.rec,y=fav.min.unfav,label=candidate)) +
  geom_point(col='blue',size=2,alpha=0.8) + 
  geom_label_repel() +
  geom_smooth(method='lm',se=F,linetype=2,col='orange') +
  labs(title="Early On, Name Recognition is Everything",
       subtitle="In polls of the 2020 Democratic nomination, better known candidates have higher favorability ratings",
       x="Percent with an Opinion about the Candidate",
       y="Net Favorability",
       caption="Source: CNN/SSRS poll of likely 2020 Iowa caucusgoers")


Tada! Our suspicion is confirmed. As name recognition increases, so does net favorability. This provides some ground to say that there’s an upside for lesser known candidates like Beto O’Rourke, Amy Klobuchar, or Kamala Harris once they actually jump in the race; as their name recognition goes up, so should their popularity.

This could also be bad for candidates who are already well known. This early, they have already shown voters who they are. If you believe that this image has been mostly popular, and more bad things than good things will be revealed for the candidate over the course of the campaign — maybe they haven’t been properly scrutinized before and there is some damaging info just waiting to be leaked — then they cannot benefit from simply telling voters more about them. The CNN analyst Harry Enten touched on this and provided some similar data to mine, but we don’t really have anything to go on here but conjecture. We’ll just have to find out!

As the race heats up, your best bet is that lesser known candidates will become more popular as people hear about them, at least as measured by their favorability ratings (not vote share), all else being equal.

I hope you learned something! Again, check out the rest of this series of short posts about using data science for political analysis and let me know what you think below. If you want more, just come back next week!

comments powered by Disqus