# Coefficient Plots in R

##### Posted on June 30th, 2012

One popular trend in presenting results is the "coefficient plot," an alternative to the table of regression coefficients. I am seeing this a little more often in political science research and have received a few requests for code, so I thought I'd write a post about it. One function (that I know of) is available in R to do this automatically, but I often want more customization than the function provide. In this post, I walk through my approach to building these coefficient plots and provide carefully commented code that I use to create them.

## Using the coefplot() function in the arm library

One good place to start is to create a quick plot using the coefplot() function in the arm library.

This function is written by Andrew Gelman and his team, but isn't publication quality. The function has several options that allow it to be customized somewhat, but I prefer total control. I like to build these plots up from scratch.

## An initial cut by hand

The first plot basically replicates the coefplot() output, but looks a lot better and offers more detailed descriptions of the variables.

I like this plot a little better than the coefplot() defaults, though I should point out the I could create something very similar to this using the options in the coefplot() function. However, there are a few things I don't like.

## Labelling the points directly

First, I think the graph is a little easier to read if the points are label directly. Rather than trace an imaginary line from left to right to decide what the points refer to, readers can simply read the labels directly from the plot.

This plot is a little better, but the scaling across the variables is much different, making them harder to compare.

## Making the points more comparable

Rather than plot the regression coefficients, I prefer to plot first-differences as explanatory variables move from their observed minimum to the observed maximum. This gives a sense of what variables have the largest substantive impact. This works similar to standardization, but generalizes a little easier to generalized linear models.

I feel pretty comfortable with this plot and it is not hard to create in R.

## Extending to generalized linear models

As a final example, I made a similar plot for a logistic regression.

## Concluding Remarks

There are a couple of things to keep in mind when using coefficient plots. First, each of these plot should probably contain a detailed caption explaining the graphs in detail. Second, while I feel that this style of plot improves upon a table of coefficients, one should consider carefully whether this type of plot is necessary at all.

The R code used to make these plots is available here.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# Code to create 'coefficient plots' as an alternative to # tables for regression models. # June 29, 2012 # Written by Carlisle Rainey. # http://www.carlislerainey.com/ # If any errors are discovered, please let me know. # To make the comments readable, make sure to display the # code in a text editor with a full screen view. library(arm) # load the arm package (contains the coefplot function) library(alr3) # load the alr3 package (contains data set for illustration) data(highway) # load the highway data set from alr3 d <- highway # call the data set d for convenience m <- lm(Rate ~ Lwid + Shld + Lane + Len + ADT + Trks, data = d) # estimate the normal linear model # create a vector to store the variable names var.names <- c("lane width, in feet", "width in feet of outer\nshoulder on the roadway", "total number of lanes\nof traffic", "length of the highway\nsegment in miles", "average daily traffic\ncount in thousands", "truck volume as a percent\nof the total volume") ### Plot 1 - A Quick Way coefplot(m) ### Plot 2 - Labels outside graph # set the graphical parameters par( family = "serif", # I don't plot in anything but serif oma = c(0,0,0,0), # Since it is a single plot, I set the outer margins to zero. mar = c(5,10,4,2) # Inner margins are set through a little trial and error. ) # create an empty plot for total customization plot(NULL, # create empty plot xlim = c(-2, 2), # set xlim by guessing ylim = c(.7, length(var.names) + .3), # set ylim by the number of variables axes = F, xlab = NA, ylab = NA) # turn off axes and labels # add the data est <- coef(m)[-1] # conveniently store the estimates (minus the constant) se <- sqrt(diag(vcov(m)))[-1] # conveniently store the std. errors (minus the constant) for (i in 1:length(est)) { # loop over a counter the length of the estimate vector points(est[i], i, pch = 19, cex = .5) # add the points to the plot lines(c(est[i] + 1.64*se[i], est[i] - 1.64*se[i]), c(i, i)) # add the 90% confidence intervals lines(c(est[i] + .67*se[i], est[i] - .67*se[i]), c(i, i), lwd = 3) # add the 50% confidence intervals text(-2.9, i, var.names[i], xpd = T, cex = .8) # add the variable names } # add axes and labels axis(side = 1) # add bottom axis abline(v = 0, lty = 3, col = "grey") # add verticle line mtext(side = 1, "Linear Regression Coefficient", line = 3) # label bottom axis mtext(side = 3, "Linear Regression Model of\n the Accident Rate per Million Vehicle Miles", line = 1) # add title box() # add lines around the plot ### Plot 3 - Labels inside graph par( family = "serif", oma = c(0,0,0,0), mar = c(5,2,4,2) # margins adjusted to reflect changing the locations of the labels ) plot(NULL, xlim = c(-2, 2), ylim = c(.7, length(var.names) + .3), axes = F, xlab = NA, ylab = NA) est <- coef(m)[-1] se <- sqrt(diag(vcov(m)))[-1] for (i in 1:length(est)) { points(est[i], i, pch = 19, cex = .5) lines(c(est[i] + 1.64*se[i], est[i] - 1.64*se[i]), c(i, i)) lines(c(est[i] + .67*se[i], est[i] - .67*se[i]), c(i, i), lwd = 3) text(est[i], i, var.names[i], xpd = T, cex = .8, pos = 3) # add variable labels above the points } axis(side = 1) abline(v = 0, lty = 3, col = "grey") mtext(side = 1, "Linear Regression Coefficient", line = 3) mtext(side = 3, "Linear Regression Model of\nthe Accident Rate per Million Vehicle Miles", line = 1) box() ### Plot 4 - Standardized Coefficients # compute quantiles d0 <- m[[12]][-1] # pull out predictors from the data set lo <- apply(d0, 2, quantile, 0) # calculate the minimum of the predictors md <- apply(d0, 2, quantile, .5) # calculate the median of the predictors hi <- apply(d0, 2, quantile, 1) # calculate the maximum of the predictors # simulate from the posterior sims <- coef(sim(m, n = 1000)) par( family = "serif", oma = c(0,0,0,0), mar = c(5,2,4,2) ) # create an empty plot for total customization plot(NULL, xlim = c(-8, 8), ylim = c(.7, length(var.names) + .3), axes = F, xlab = NA, ylab = NA) # add the data for (i in 1:(length(coef(m)) - 1)) { x.lo <- md; x.lo[i] <- lo[i]; x.lo <- c(1, x.lo) # Set the predictor of interest at its minimum and others at their median. x.hi <- md; x.hi[i] <- hi[i]; x.hi <- c(1, x.hi) # Set the predictor of interest at its maximum and others at their median. sim.lo <- sims%*%x.lo # Compute predicted values from simulations. sim.hi <- sims%*%x.hi # Compute predicted values from simulations. sims.dif <- sim.hi - sim.lo # Compute the differences. q <- quantile(sims.dif, c(.05, .25, .5, .75, .95)) # Find the quantiles of interest. points(q[3], i, pch = 19, cex = .5) # Plot the medians of the difference. lines(c(q[1], q[5]), c(i,i)) # Plot the 90% confidence intervals. lines(c(q[2], q[4]), c(i,i), lwd = 3) # Plot the 50% confidence intervals. text(q[3], i, var.names[i], xpd = T, cex = .8, pos = 3) # Add the variable names. } axis(side = 1) abline(v = 0, lty = 3, col = "grey") mtext(side = 1, "First Difference as a Variable\nMoves from Its Minimum to Its Maximum", line = 3) mtext(side = 3, "Linear Regression Model of\nthe Accident Rate per Million Vehicle Miles", line = 1) box() ### Plot 5 - Logistic Regression d <- read.csv("http://www.carlislerainey.com/Files/anes1992.csv") # call the data set d for convenience m <- glm(Turnout ~ PartyID.Folded + Age + Female + Black + Union.Member + Education, family = binomial, data = d, x = T) var.names <- c("partisan strength", "age, in years", "female", "African-American", "union member", "education level") d0 <- m[["x"]][, -1] lo <- apply(d0, 2, quantile, 0) md <- apply(d0, 2, quantile, .5) hi <- apply(d0, 2, quantile, 1) sims <- coef(sim(m, n = 1000)) par( family = "serif", oma = c(0,0,0,0), mar = c(5,2,4,2) ) # create an empty plot for total customization plot(NULL, xlim = c(-.5, 1), ylim = c(.7, length(var.names) + .3), axes = F, xlab = NA, ylab = NA) # add the data for (i in 1:(length(coef(m)) - 1)) { # loop over a counter the length of the estimate vector x.lo <- md; x.lo[i] <- lo[i]; x.lo <- c(1, x.lo) x.hi <- md; x.hi[i] <- hi[i]; x.hi <- c(1, x.hi) sim.lo <- plogis(sims%*%x.lo) # be sure to convert the linear predictor using the link function sim.hi <- plogis(sims%*%x.hi) # be sure to convert the linear predictor using the link function sims.dif <- sim.hi - sim.lo q <- quantile(sims.dif, c(.05, .25, .5, .75, .95)) points(q[3], i, pch = 19, cex = .5) lines(c(q[1], q[5]), c(i,i)) lines(c(q[2], q[4]), c(i,i), lwd = 3) text(q[3], i, var.names[i], xpd = T, cex = .8, pos = 3) } axis(side = 1) abline(v = 0, lty = 3, col = "grey") mtext(side = 1, "First Difference as a Variable\nMoves from Its Minimum to Its Maximum", line = 3) mtext(side = 3, "Logistic Regression Model of\n(Self-Reported) Voting", line = 1) box() |