Why I Don't Like Coefficient Plots

Posted: 07.06.2012

Over the last few days, I've written a couple of posts (here, here) about creating coefficient plots. I like them way better than tables, but I don't really see a need for them. In fact, I think they can be misleading. In this post, I explain why (for the most part) I do not include regression tables in my papers and what I use in their place.

To understand why regression tables can be mislead, let's return to an example that has come up a couple of times in the last few posts--a simple model of turning out to vote. I run the logistic regression model and obtain the coefficients below.

If we are willing to interpret this regression model causally, then it seems that education, for example, has a positive effect on the probability of turning out. However, this is holding all other explanatory variables in the model constant. Does this make sense? I don't think so. Education probably affects both union membership and partisan strength. In order to get a better estimate of the effect of education, we should probably leave the "intervening" variables of union membership and partisan strength out of the model. (This is something that Andrew and Jennifer talk about a good bit in their book, but I don't come across it much in political science research.)

If the coefficients for control variables (predictors other than the primary explanatory variables) cannot be directly interpreted, why include them in the main text? In the interest of transparency and reproducibility, it makes sense to make these available to others in an appendix, but journal space is too valuable to include information so far removed from the main argument.

Instead of hypothesizing about control variables, providing (in a table or graph) and "interpreting" their coefficients, I prefer a detailed presentation and discussion of the effects of interest. Control variables should be mentioned, perhaps in the research design section, but discussion of their estimates is unnecessary.

An Example

A paper that I recently revised and resubmitted to the American Journal of Political Science illustrates (I think) the power of this approach.

I argue using a formal model that, contrary to the large literature on comparative electoral institutions, proportional electoral rules actually reduce parties' incentives to mobilize. I make three specific claims.

  1. Competitiveness Hypothesis: In both SMDP and PR systems, the mobilization effort by a district’s parties increases as the district’s competitiveness increases.
  2. Disproportionality Hypothesis.At any level of competitiveness in a district, the mobilization effort by the district’s parties is greater in SMDP systems than in PR systems.
  3. Interaction Hypothesis: The (positive) marginal effect of a district’s level of competitiveness on the mobilization effort by the district’s parties is greater under SMDP rules than under PR rules.

The typical approach in political science would be to run a regression model, put the coefficients in a table (or graph), and star the ones that are statistically significant. A Sophisticated researcher might include Brambor, Clark, and Golder's "marginal effect" plots.

Instead, I prefer a focused and detailed approach on the quantities relevant for the hypotheses. I provide these figures below. The figures combined with the detailed captions illustrate how powerful a detailed, focused approach can be.



  • MZH

    Do you really mean "leave ... out of the model" (vs. "no need to report")

    "we should probably leave the "intervening" variables of union membership and partisan strength out of the model"

    • Carlisle Rainey

      Yes. If we believe that education influences union membership and partisanship (and not vice versa), then we should leave these two variables out of the model.

      If we want to know the effect of an additional year of education, for example, then we shouldn't hold things constant that are likely to be influenced by education, and in turn influence the outcome variable. Doing so would bias our estimate. Gelman and Hill talk about this a lot.

  • MZH

    Could you please provide the page reference from Gelman & Hill? My understanding is that if you leave out these variables in a causal model, your coefficients estimates will be biased (due to endogeneity / omitted variable bias / confounding). [Honestly, I didn't read your post carefully so I may be misunderstanding something since endogeneity is a rather basic issue ... I came to your blog because of coefficient plots :) ]

    • Carlisle Rainey

      Just to be clear, suppose we want to estimate the relationship between X and Y. If we have a variable Z that causes both X and Y, it should be in the model. If we have a variable Z that is caused by X and causes Y, it should not be included in the model.

      I don't have my copy in front of me, but I think you'll find something helpful in Gelman and Hill around pp. 190-194.

  • Brett

    Interesting. But, why not actually model the causal system that you believe is operating rather using a than a GLM? If you have intervening variables, why not use a model that is more approriate to estimate the mediation effect and which shows the causal path that you are hypothesizing rather than just leaving some stuff out?

    • Carlisle Rainey

      As long as we are not interested in the mediation, we can safely ignore the mediating variables. On the other hand, if we want to argue that variables are indeed mediators, your approach is a good one.