I spend most of my days now working with data. From collecting, to tidying, to analyzing to communicating, my workflow revolves around a good comprehension of the information at hand. In Garrett Grolemund Hadley Wickham’s R for Data Science, the fourth (seventh, tenth, thirteenth … nth) step in the process by which we move from acquiring to presenting insights from data:
Visualization is thus a crucial step for anyone working with data, but how do we make good visualizations? Though scholars and journalists have presented good guidelines for what a good graphic looks like — I’ll direct you to Edward Tufte’s Envisioning Information — I have more notes to transcribe here about where rather than how to make a good chart.
First and foremost I recommend any series analyst learn the skills to work in the R programming language. It’s rather easy to get set up in RStudio and begin transforming and visualizing your data with the Tidyverse suite of packages for R. The developers have done a great job creating a grammar for data manipulation, visualization, and analysis that is accessible for newcomers and (most importantly, in my opinion) easy to read; each statement is sent (or “piped”) to another statement of function to perform complex operations on separate lines of code. This is the solution I’ve chosen to use in my work over at my blog.Plus, the plots are scriptable and reproducible, so if you make a blog written in R (like this one!) you can have your graphics regenerate simply by rebuilding the file (a process that you can automate for future report generation).
In R and the Tidyverse, you will likely turn to a package called ggplot — the most well known visualization tool in R (for good reason). Ggplot allows you to make static plots with a variety of geometric options and can export them in any image format you desire. Plus, there are a variety of themes available to help you make your charts more visually appealing than the ones in base R. If you want interactive charts in R I recommend using Plotly, another good tool that can be easily embedded in automated reports.
Of course, where would the field be without the desktop powerhouse application Tableau? Tableau is the corporate solution for data visualization; it provides an out-of-the-box feel to creating plots and figures that cannot be beat if you all you want are two simple plots, right now, with an Excel spreadsheet in hand.
When I’m away from my laptop (ha!) I often turn to web solutions to mock up graphics or share some quick data online. When this happens, I turn to my favorite online solution Datawrapper. Datawrapper offers a simple method for designing visualizations — all you have to do is paste in your data and choose the chart you want!
If you want permanent graphics (perhaps to host on your new blog, or maybe your newsroom is thinking about switching to a new platform), Datawrapper does not only create static figures. It also lets you make interactive charts that can be embedded into your website. This is one of the cooler features of the free online tool (though I have yet to use it).
Though there are some more serious data visualization tools out there for the more enterprising of us, the ones above are a good place to start for 90% of analysts. Most of the time we are not doing analyses serious enough to warrant full knowledge of D3.js, anyways — and if you’re working in print, that particular interactive tool won’t ever be a part of your belt. And if you’re anything like me (working in R), it’s easier to learn a solution that works in the same coding as your analysis.
- LOESS vs Bayesian GAM for Finding Trends in Data
- Now Out: My Course "Analyzing Polling and Election Data in R" at DataCamp!
- The Political News That We're Googling Ahead of the Midterms