Now Out: My Course "Analyzing Polling and Election Data in R" at DataCamp!
Sep. 29, 2018
Learn R for data science by wrangling, visualizing, and modeling political data like polls and election resultsRstats R Tidyverse Politics Data Science Probability Statistics
Today marks the unofficial launch of my course for teaching the R programming language using political data. It has been a long 8 months of coding exercises, writing presentations, and course development that has intellectually paid off for me more than I originally anticipated. Now, I hope it will pay off for you!
The course, “Analyzing Polling and Election Data in R”, is an introduction to the R programming language that teaches data tidying with
tidyr, data visualization with
choroplethr, and data analysis with modeling (linear regression) and time series analysis (moving averages, LOESS smoothing). The course is designed to teach R to a wide audience; complete beginners, those familiar with R but not the
tidyverse suite of packages, and those who already work with political data but are hoping to learn R are among those who will get the most out of the course. That being said, the analysis of 6 unique political datasets with relevant and insightful examples makes the content accessible to any student. And of course, the curriculum is on demand and self-paced, so it will fit your schedule, however hectic.
Chester Ismay, senior curriculum lead at Data Camp, came to me with the idea to make a course teaching R with political data science in November of 2017. We quickly hit the ground running with engaging examples of useful R tips and tricks using polling data and election results. The original design for the course was submitted in February of 2018, and after a short hiatus we resumed development in the summer with the aim of publishing in late September. Thanks to David Campos for his hard work on course content and logistics and to Shon Inouye for catching many more errors than did my eyes in the course’s slides and exercises. In sum, this was a group effort; we all worked long and hard on this course to make it the best it could be!
Check the course out here — and if you do decide to take it, be sure to finish all the exercises to earn your credit! — and read on for more background on the course and my plan of doing more in the future to teach R for, and with, political analysis.
Overview and chapter descriptions
Course summary at DataCamp:
This is an introduction to the R programming language for data science and statistical analysis. This course teaches students how to wrangle, visualize, and model data with R by applying these techniques to real-world political data like public opinion polling and election results. The tools that you’ll use in this course, from the dplyr, ggplot2, and choroplethr packages, among others, are staples of data science and can be used to analyze almost any dataset you get your hands on. Students will learn how to mutate columns and filter datasets, graph points and lines on charts, make maps, and create models to understand relationships between variables and predict the future. This course is suitable for anyone who already has downloaded R and knows the basics, like how to install packages.
The course follows the philosophy that learning R is additive; wrangling helps you visualize, visualization helps you model, and modeling helps you make inferences. In other words, learning one tool helps you to learn another. But R is also reinforcing. Good data science is a cycle of wrangling, modeling, visualizing, inferring, and repeating. In keeping with this, the course is laid out as such:
- Wrangling presidential job approval polls. You will start with a dataset of presidential job approval ratings over time and, using the
rollmean()functions create a moving average of presidential approval ratings for every president since Harry Truman in 1948.
- Wrangling and visualizing US House and Senate polls. You will build off of your knowledge from chapter 1 by learning
lm(), the function for training linear regression models, and how to make a variety of visualizations with
ggplot2, include line graphs, scatterplots, and trend lines. You will learn about margins of error.
- Wrangling, visualizing (with maps), and modeling county-level results for the 2016 election and analyzing polls and votes from the UK’s Brexit referendum. You will learn how to map US geographic data with the
choroplethrpackage, how to use
geom_smooth()to draw linear trend lines in data. You will learn the ins and outs of
summary()for evaluating regression models in R and explore bivariate and multivariate regression.
- Applied examples practicing all 3 skills for predicting the 2018 elections to the US House and the 2020 presidential election. You will use all of your data wrangling, modeling, and visualization skills to predict election results by training models with on old data and making predictions on new data.
As far as I am concerned, this is only the tip of the iceberg for what you will learn in the course and what the course can enable you to do in your own life. We have worked hard to write lessons and exercises that teach you a comprehensive and externally useful data science skill set. Wrangling, modeling, and visualizing are part of any data science workflow, not just when analyzing political data. Though the lessons are useful, they are no means the end game. As with any good course, you’ll experience the (hopefully inspiring) beginning of what you can accomplish on your own.
What’s comes next?
I am very excited that the course is finally out in the open and available to anyone and everyone who wants to learn R for political data analysis. This means of course that students can finally take the course — hooray! the work paid off! — but also that we can now receive valuable feedback from anyone who takes it. Thing X doesn’t make sense? Thing Y could be improved? Now we’ll have heard from the actual people taking the course. I also want to start writing more R posts on this blog. Some of you have noticed that my R posts now include code that can be toggled on and off, which I think is a good first step. While I’ll be continuing this, I also will be releasing the code for some of the larger projects that I work on. Next steps are exciting. They often take us places we haven’t gone before. I’m looking forward to where we all go next!
- LOESS vs Bayesian GAM for Finding Trends in Data
- The Political News That We're Googling Ahead of the Midterms
- Women, Not a "Liberal Tea Party," are Changing the Democratic Party