When Are Hierarchical Models Useful?
Posted on September 7th, 2011
This generated quite a stir on Twitter. There was some back and forth that caused me to think more about exactly when hierarchical modeling is useful and when it offers only unnecessary complexity.
My initial reaction is that King's tweet is excellent advice in many situations. Hierarchical models offer a useful tool that can help summarize results from lots of separate regressions, but a good plot works just as well. However, when I am interested in modeling variation in coefficients across surveys, I am tempted to turn to hierarchical models, especially when the number of groups is large.
But I had no idea what sort of standards to apply. When is it okay to estimate each model separately and model the coefficients from these models using a second-stage, separate regression? I had some intuition, but no idea if it was right or wrong. So I did a few simulations. But first, let me elaborate on King's important point.
The Real Power of Hierarchical Models
The real power of hierarchical models is the ability to partially pool coefficients across groups. Suppose I have several individual-level survey data sets from different countries. I expect the intercepts to be different, but similar, across countries. After running the regressions separately on each country in the data set, I realize that I've probably under-estimated the smaller intercepts, and over-estimated the larger. I would like to adjust for this, and hierarchical modeling allows me to do this in a rigorous manner.
To illustrate, I simulated a hierarchical data set with a binary outcome variable, three individual-level explanatory variables, and allowed the intercept to vary with a group-level covariate. Hierarchical modeling allows me to pool the intercept estimates of each group toward the overall mean, conditional on the group-level covariate. Another way to handle this would be country-level "fixed effects," estimating a separate intercept for each country. The two models are identical, except the hierarchical probit model utilizes partial pooling and the probit with fixed effects does not.
In the left panel of the figure above, the red circles show the estimates from the hierarchical model and the black dots show the estimates from the non-hierarchical model. The solid red lines connecting the two indicate how the estimates change between the two models. Longer lines indicate more partial pooling. The dotted line shows the estimated mean intercept, conditional on the group-level covariate. Notice that the estimates from the hierarchical model (with partial pooling) is an intuitive compromise between the fixed effects (no pooling) and the dotted line (complete pooling).
While the amount of partial pooling is substantial in the left panel, it has almost disappeared in the second panel. This is a different simulation, with the number of individuals per group increased from 50 to 1000. There is almost no partial pooling in this model. This, I think, is King's point. Hierarchical models are cumbersome to discuss, estimate, and evaluate. If the size of the groups is large, then the key feature of hierarchical models is no longer useful and one might be better served by non-hierarchical models.
While partial pooling is the main advantage of hierarchical models, there are other nice features as well. One question that comes to mind when I consider using hierarchical models or estimating separate models for each survey is whether or not I want to model the variation in the coefficients across surveys. If I do, then I am tempted to go with a hierarchical model. If not, then separate regressions works fine.
Why is this? Well, coefficients are estimated, not known, so there is some uncertainty involved. Treating coefficients as known and regressing them on a set of group-level covariates does not account for this uncertainty. As the number of groups decreases and the uncertainty in the coefficients increases, the standard errors from the group-level model become increasingly biased downward.
It is fairly intuitive to think that if the uncertainty around the estimate of an observed group increases relative to the uncertainty of an unobserved group, then hierarchical models become more important. Suppose that you have estimated 20 regressions using surveys from 20 different countries. Suppose further that you think the intercept should vary with some country-level covariate. If you regress the estimated intercepts on the country-level covariate, then there is only a single source of uncertainty. This uncertainty comes from only having observed a few cases. (In a sampling framework, I would call this sampling error.) If we had sampled different countries, or repeated the implied experiment, we would almost certainly observe different values. But this is only one source of uncertainty. We ignored the other source of uncertainty when we treated the intercept as known in the group-level regression. If this source of uncertainty is large relative to the sampling error, then it is more important to account for it. Some quick simulations verify this intuition.
Each row of this figure shows the estimated intercepts and the standard error generated by a hierarchical model and the two-stage, separate regressions approach. Moving downward, the sampling uncertainty increases relative to the uncertainty surrounding the estimates. Similarly, when the sampling uncertainty is low, the standard errors from the separate regressions approach are biased downward. However, as the sampling uncertainty increases relative to the uncertainty surrounding the intercepts, the two methods produce indistinguishable standard errors.
(I should note that when the amount of partial pooling is more substantial, hierarchical models can give smaller standard errors. This is because the sampling variability is smaller in the group level model after partial pooling.)
While simulations are easy and important, I care mainly about how models differ on real data. Obviously the data differ across areas of application. Therefore, it makes sense to try different methods and evaluate the differences for each application. If simple and complex methods give similar answers, it makes sense to rely primarily on the simpler method. However, in the case where the two methods disagree, it is important to understand what leads to the differences and choose between the two only after understanding this.
Finally, I show a quick empirical example. I gave a short presentation last year on cross-national turnout. I hypothesized (for theoretical reasons I can't get into here) that education should have a larger effect in countries where citizens have more control over which candidates get on the ballot. Using the CSES, I estimated different logit models for each country and then regressed the education coefficients on the amount of citizen control over the ballot.
This figure shows some support for the hypothesis. The effect of citizen control on the effect of education is statistically significant. But what happens when a hierarchical model is used rather than a two-stage, separate regressions approach? The standard errors actually get smaller. This suggests that there is a fair amount of partial pooling, and this is reasonable since some of the samples are less than 500 and there is a lot of uncertainty around the effects shown above. The tool used doesn't seem to affect the inference in this case, but it could. A good next step in this analysis would be to plot the estimated coefficients using separate regressions and a hierarchical model to better understand the differences.
I pretty strongly agree with the sentiment of King's statement: complexity for complexity's sake is not helpful. However (and I think King would agree), this does not mean that hierarchical models should always be abandoned when an analyst has a large number of respondents in each group. One should consider a simpler, separate regressions approach when the number of observation within each group is large. But especially when the between-group variability is small relative to the uncertainty in the estimates, hierarchical models can help avoid standard errors that are too small.
In any case, I think it always makes sense to consider and compare several different methods, plotting and comparing inferences to understand both the similarities and differences among the approaches.