Latent Class Analysis with poLCA

On an airplane the other day, I learned of a method called latent class (transition) analysis, and it sounded like an interesting thing to try in R. Of course, as with everything R, There is a Package for That, called poLCA, written by none other than Drew Linzer (of Votamatic fame) and Jeffrey Lewis.

I wasn’t able to think of a good application for transition analysis specifically, but I did use Christopher’s ANES data to estimate latent “types” of respondents. The example model illustrates a four-class model, and I’ll leave it as an exercise for the interested reader to assign subjective names to each class.

This Gist also attempts to improve on the default plot both by eschewing the 3-D effect, and by putting classes, rather than variables, in direct comparison with one another. Also, for what it’s worth, the plot code shows how to draw a bar plot when you have already computed counts or proportions — use¬†stat=”identity”.

Thanks for celebrating Advent with us, and for your feedback and support. We’re taking a little break after tomorrow’s post, but we’ll be back better than ever next year!

Beautiful network diagrams with ggplot2


I don’t usually like describing my own work as “beautiful,” but with your permission I will make an exception today. There have been some requests for scripts illustrating the plotting of network diagrams with ggplot2, and today (for the winter solstice) we’re bringing you a really nice-looking way of doing just that.

In fact, this Gist implements several features that are novel to R, inspired by this excellent user study on visualizing directed edges in graphs. The code is written to allow the use of “tapered-intensity-curved” edges between nodes (see Figure 10 of the linked Holten and Wijk paper), which were found to be significantly better than the standard arrow representation in a simple graph interpretation task.

It is easy to “turn off” any of these three attributes (taper, intensity, curve), either through the workhorse¬†edgeMaker() function defined in the script, or in the plot code itself. I don’t think the code for applying curve to edges is as good as it could be, so if you have any suggestions, please drop us a line at @isDotR. Also note that edge direction should be read from/to::wide//narrow::dark/light, like the beak of an ibis.

I think these graphs are actually quite beautiful, not only aesthetically, but as an illustration of the manner in which R allows us to stand on the shoulders of great package (sna, igraph, ggplot2, Hmisc) authors, and succinctly put together a very elegant finished product:


Plotting RealClearPolitics polling trends with a faux axis break

We’ve recently seen how to parse XML for the data that goes into producing graphs such as the one on this page, comparing the Romney vs. Obama polling average.

Today’s Gist shows how to approximately replicate that very figure, with two trend lines indicating each candidates’ polling average, plus an area plot of the difference between the two. This is done in ggplot2 by way of a semi-hack, which uses the scales package to transform the y-axis so that part of the continuous scale is “cut out.”

The figure above is built in 5 steps:

  1. Plot the two competing trendlines
  2. Add an area plot showing Obama’s (possibly negative) lead
  3. Write a function (scaleBreaker()) that takes a continuous vector and makes a break in it, between a lower and an upper threshold
  4. Use this function to make a custom axis transformation with trans_new()
  5. Add standard ggplot2 axis breaks and labels, apply the custom transformation, and draw a line to emphasize the disjunction

By d-sparks

Tags: XML ggplot2 lubridate reshape2 scales

Elongating and stacking wide data

This post is in response to an is.R() reader’s “Ask us anything" query. In short, the reader has several .CSV files of World Bank Data, in which each row is a Country, each column is a Year, and each separate file contains a different variable (like population, GDP, etc.).

Today’s Gist illustrates how to use a simple loop to load, reshape, and then “stack” multiple data sets into a more usable form. The first part of the script creates and saves some random “World-Bank-like” files, which are then loaded and stacked in an iterative fashion.

By d-sparks

Tags: reshape2 ggplot2

Simplest possible heatmap with ggplot2

Featuring the lovely “spectral” palette from Colorbrewer. This really just serves as a reminder of how to do four things I frequently want to do:

  1. Make a heatmap of some kind of matrix, often a square correlation matrix
  2. Reorder a factor variable, as displayed along the axis of a plot
  3. Define my own color palette with colorRampPalette()
  4. Use RColorBrewer, specifically the diverging “spectral” scheme

Visually-weighted regression plots, with Zelig

As a follow-up to yesterday’s post on producing visually-weighted regression plots, here is some code which illustrates the production of similar plots, but using Zelig's convenient modeling and simulation functions.

This code was produced to assist a colleague, which just goes to show that the “Ask us anything" page really works!

By d-sparks

Tags: rstats graphics reshape2 Zelig ggplot2

Simple visually-weighted regression plots

There has recently been a lot of discussion of so-called “visually-weighted regression” plots.

Folk hero Hadley Wickham suggests that such plots would be easy to implement with ggplot2, and so I have attempted to prove him right.

The approach outlined in the following Gist would be easy to apply to any situation in which you have a matrix of replicated predictions or bootstrapped fits from a model — any such a matrix would just take the place of the simYhats object.

The end-product of this example

By d-sparks

Tags: MASS ggplot2 graphics reshape2 rstats