The Political News That We’re Googling Ahead of the Midterms

Sep. 19, 2018

Categories: Quick Posts US Politics Tags: Democrats Midterms Google Trends R


Google came out with a new tool today that lets you explore and visualize patterns of elections-related search interest across the world. Importantly, it lets you visualize traffic at the congressional district level. Hey, that’s pretty neat.

If our question is “what issues are most important to voters?” the best thing to do, typically, is to ask them. Ranking of the Most Important Problem facing the country are available from polling organizations and stretch back decades. However, sometimes you don’t have polling data, or want a different take. You can turn to Google trends data for that.

library(tidyverse) # load the tidyverse for all of our wrangling and visualization needs
library(gtrendsR) # load the packages for downloading google trends data
library(kableExtra) # load the package for making pretty tables

# download the trends data for four important topics
# download the google news trends because people could be searching for health care news regularly on the web
trends_data <- gtrends(c("healthcare","supreme court","economy","immigration"),
                       geo = "US",gprop = "news", time = "today 12-m")

interest_over_time <- trends_data$interest_over_time # the data we want is stored as the "interest over time" object in the returned list

Below, I’ve sorted search traffic for “healthcare”," supreme court“,”economy“, and”immigration" according to the days when each got the most relative hits.

interest_over_time %>%  
  mutate(date = format.Date(date,'%D')) %>% # format the data variable
  arrange(desc(hits)) %>% # arrange the daily hits in descending order
  head(10) %>% # get the first ten rows
  select("Date" = date,
         "Keyword" = keyword,
         "Hits (Relative Search Interest, %)" = hits) %>% # rename som variables
  kable() %>% # pass to the table function in knitr
  kable_styling() # make it pretty
Date Keyword Hits (Relative Search Interest, %)
06/17/18 immigration 100
02/11/18 immigration 65
06/24/18 supreme court 59
06/24/18 immigration 53
01/28/18 immigration 52
01/21/18 immigration 50
02/04/18 immigration 46
06/10/18 immigration 45
01/07/18 immigration 41
01/14/18 immigration 41

Aside from some blips in the news cycle, voters have been constantly concerned with immigration news in the US. In fact, “immigration” is the most-searched keyword (in our set of 4) in 88 of the last 100 days of Google trends data.

interest_over_time.spread <- interest_over_time %>% 
  spread(keyword, hits) # put all the hits for each term into their own column

# for every day, return the most searched term
most_searched <- lapply(1:nrow(interest_over_time.spread),
       function(x){
         most_searched_position <- which(interest_over_time.spread[x,][,5:8] == max(interest_over_time.spread[x,][,5:8])) # get the column index for the most searched term
         return(names(interest_over_time.spread)[5:8][most_searched_position][[1]]) # pull the name of that column, and make sure we simply take the first name when two items are tied
       }) %>% 
  as.character() %>% # make the list into a vector
  table() %>% # get the occurrences for each term
  prop.table() %>%  # transform to Frequency
  as.data.frame() %>% # make it into a data frame ...
  setnames(c("Keyword","Frequency")) # ... one that has names

# format and make a table
most_searched %>% 
  mutate(Frequency=round(Frequency*100)) %>% # from decimal to integer
  select(Keyword,"Days as Most-Searched Term (%)" = Frequency) %>%
  kable() %>%  # make a table
  kable_styling() # make it pretty
Keyword Days as Most-Searched Term (%)
economy 14
immigration 78
supreme court 8

Here’s what those data look like over time:

gg <- ggplot(interest_over_time,aes(x=date,y=hits,col=keyword,fill=keyword)) + 
  geom_line(size=0.9) +
  scale_color_brewer("",palette = "Set2") +
  labs(title="What political news are people searching for?",
       subtitle="Relative search traffic from to Google Trends",
       x="Date",
       y="Relative Search Traffic")

plot_elliott(gg,width=2800,height=1800,unit='px',res=340,debug=T,
             themearg =  theme(legend.position='top'))
Plotting search interest for certain key news topics

Figure 1: Plotting search interest for certain key news topics

You can see that there was a spike in search traffic about the Supreme Court when Justice Anthony Kennedy retired in the summer. Simultaneously, there was an increase in search traffic on immigration, as a crises unfolded at the border whereby migrant children were being separated — in some cases, permanently — from their parents.

Of course, I have to mention the obvious: Donald Trump. According to our search data, the midterms are really about him anyways. This certainly fits the narrative from political science literature.

# get new trends data
trends_data <- gtrends(c("healthcare","supreme court","economy","immigration","trump"),
                       geo = "US",gprop = "news", time = "today 12-m")

interest_over_time <- trends_data$interest_over_time

# make the Trump graph
gg <- ggplot(interest_over_time,aes(x=date,y=hits,col=keyword,fill=keyword)) + 
  geom_line(size=0.9) +
  scale_color_brewer("",palette = "Set2") +
  labs(title="What political news are people searching for?",
       subtitle="Relative search traffic from to Google Trends",
       x="Date",
       y="Relative Search Traffic")

plot_elliott(gg,width=2800,height=1800,unit='px',res=340,debug=T,
             themearg =  theme(legend.position='top'))             
Plotting search interest for certain key news topics and Donald Trump

Figure 2: Plotting search interest for certain key news topics and Donald Trump

Ultimately, does this all matter? How much? It’s hard to say. But the data are fun to wrangle and visualize and are helpful at an basic level, at least. Some issues do matter more than others. But which will matter most to the voters who will decide the outcome? That, we can’t know from this information.






comments powered by Disqus