Well, to be specific, I mean measuring district compactness (a very interesting subject, see these three articles for starters). There are myriad ways of measuring the “oddness” of a shape, including a comparison of the area of the district to its circumcircle, the moment of inertia of the shape, the probability that a path connecting two random points will pass through the polygon, etc.
In today’s Gist, I use the spatstat package to convert Congressional district shapefiles to owin objects, which can be very persnickety — meaning that for our present purposes I have just skipped over districts with overlapping polygons or other owin conversion obstacles. However, spatstat lets us do neat things with owin objects, including the calculation of the area and perimeter of polygons, which I use to compute and then plot a simple Area / Perimeter ratio measure of district compactness.
As you can see in the guilty-pleasure Spectral palette choropleth below (click it for a larger view), the least compact districts are unsurprisingly typically found in high-population-density areas. Also, you can use this map to find your way from Greensboro to Charlotte, via I-85.
You’ve already seen everyone else’s electoral map (see this amazing array of maps from 2008), how would you like to make your own?
Today’s Gist allows you to do just that — input (manually!) state-by-state results, and output a beautiful choropleth map of presidential election results!
This code, of course, can be used for any number of state-level measures, so you might want to bookmark this one for future use.
I really enjoy using the DW-NOMINATE data for examples, as I do here. Sometimes it’s useful to indicate regions in the background of a plot — perhaps two-dimensional regions of interest, perhaps one-dimensional periods in time. It’s not always obvious how to combine data from two data.frames to form one plot in ggplot2, so here is another example.
The trick seems to be that the “second” data frame needs to include all of the same variables as you are using from the “first” data frame (in name, at least — that is, if you are plotting variables called “x”, “y”, and “z” from the “first” data frame, your second data.frame needs to include variables names ”x”, “y”, and “z,” even if you’re not plotting with those, and even if they are assigned equal to some arbitrary constant, as in df2$z <- 1).
Whenever possible, I try to save R graphic output in a vector format, typically pdf(). I also like to use the handy ggsave() function to do so, as it streamlines the process, and makes it easy to be consistent across formats.
However, at times it is necessary to use a bitmap graphical format, in which case I always prefer to use .pngs. The only downside is that, in Windows at least, png() produces non-anti-aliased graphical elements.
There is, of course, a package for that, called Cairo, which uses cairographics to produce bitmap images with transparency, alpha levels, and anti-aliasing. However, I could never figure out how to use Cairo with ggsave(), until now:
I only rarely have the occasion to need the convex hull of a set of points, but I love chull(), so I’d like to share an example of how to use it.
This Gist also offers a pretty straightforward application of the Split-Apply-Combine strategy (see lines 40-44), which is consistently useful, but complicated enough that it probably deserves its own post.
I don’t really care for the name “marimekko” or “mosaic,” but I do like this type of plot as a means of illustrating proportions in nested categorical data, or as an alternative to the parallel time series plots discussed here (see this rather amateurish example).
The Gist below is my attempt to make these somewhat complicated plots as simple as possible (here is another worthwhile approach). One problem that I run into when producing these mosaic plots is that every one seems different, but I have attempted to distill each box into four consistent parameters, and I include two distinct data examples.
In developing plots, I often use color (or “colour” in ggplot2 parlance) to reflect values of a third, non-X/Y, variable. Depending on the distribution of this Z variable, however, the effective color range can be narrow, making it difficult to discriminate between Z values, as in this plot:
As you can see, the bulk of the points are in the middle, yellow/white range, while green/blue and orange/red only appear near the edges. This is as it should be, since Z = X * Y, and there are relatively few extreme values. However, I am, in a sense, “wasting” a lot of the color range available to me. Fortunately, the package scales offers a function called trans_new(), which permits one to apply any function on the distribution, as I do below. (Thanks to “mnel” at stackoverflow.com.)
Featuring the lovely “spectral” palette from Colorbrewer. This really just serves as a reminder of how to do four things I frequently want to do:
- Make a heatmap of some kind of matrix, often a square correlation matrix
- Reorder a factor variable, as displayed along the axis of a plot
- Define my own color palette with colorRampPalette()
- Use RColorBrewer, specifically the diverging “spectral” scheme