Why You Shouldn't Conclude "No Effect" from Statistically Insignificant Slopes
Posted on June 16th, 2012
It is quite common in political science for researchers to run statistical models, find that a coefficient for a variable is not statistically significant, and then claim that the variable "has no effect." This is equivalent to proposing a research hypothesis, failing to reject the null, and then claiming that the null hypothesis is true (or discussing results as though the null hypothesis is true). This is a terrible idea. Even if you believe the null, you shouldn't use p > 0.05 as evidence for your claim. In this post, I illustrate why.
To demonstrate why analysts should not conclude "no effect" from insignificant coefficients, I return to a debate waged over blogs and Twitter about a NYT article. See Seth Masket's original take, my response, and Seth's recasting. The data come from Nate Silver's post, which adopts a more nuanced position that I think is appropriate in light of the data.
First, I simply plot the data. These are not the exact data used by Masket, but they are similar.
Notice that there is no clear pattern in the data and the best-fit line is almost flat. (By "best-fit," I mean least squares. Median-based regression tools find a slight negative slope that is still nowhere near significant.) No doubt, the best-fit line represents a tiny negative effect that is substantively meaningless.
Because these data are "very" consistent with a "no effect" hypothesis, we certainly cannot reject the null hypothesis... but that does not mean that we should believe there is no effect.
Concluding That a Variable Has an Effect
Before jumping into explaining why we should not take failure to reject the null as evidence for the null, it seems appropriate to review the logic of hypothesis testing. To illustrate, I created a fake data set similar to the data discussed above. In this data set, however, the estimated effect is larger, just large enough to give a one-tailed p-value of 0.05--barely significant.
Based on these fake data, most analysts would have confidence that an increasing unemployment rate is associated with a decreasing margin of victory. This is standard practice and most likely the correct conclusion.
To see why this conclusion makes sense, let's look at a variety of relationships that are "plausible" based on the data. (The 50 plausible lines are created by simulating from the posterior distribution.) Notice that almost all of the plausible lines are negative--this is why we can confidently claim that unemployment is negatively related to the margin of victory.
This is the basic logic of hypothesis testing--conclude that your claim is correct if the chance of alternative claims being correct is small.
Concluding that a Variable Has No Effect?
Now that we've seen how the process of hypothesis testing works, let's see how it works in the context of a failure to reject the null. The process seems to go something like this...
- A journalist makes a claim, such as "High unemployment makes Obama less likely to win reelection."
- An analyst comes along and finds that she cannot reject the null hypothesis of no relationship.
- Because of the failure to reject the null, the analyst concludes that the journalist is wrong and that higher unemployment has no effect on Obama's reelection chances.
This logic is flawed. Failure to reject the null is not evidence for the null. To see why, let's look at the set of plausible lines for the actual unemployment/margin of victory data.
Notice that many of these lines are entirely consistent with the hypothesis that higher unemployment is associated with a smaller margin of victory. How can we confidently conclude "no relationship" from data that were plausibly generated by a large negative effect? We can't (or at least we shouldn't).
Finding data that are inconsistent with the null hypothesis is good evidence that that research hypothesis must be correct. However, failure to find data that are inconsistent with the null hypothesis is not good evidence that the null hypothesis is correct. To illustrate, I showed that although the unemployment/margin of victory data were used to support the claim that "unemployment does not influence margin of victory," the data are consistent with a wide range of strong negative relationships (and strong positive relationships).
In political science, we only rule out implausible effects. For example, we conclude a relationship is positive only when a non-positive effect would not often generate our observed data. However, accepting the null because you've failed to reject it perverts this logic. When two observed variables have no correlation, this is consistent with many positive and negative effects. When the number of observations is small, even large positive and negative relationships can generate data sets with no correlation. Therefore, when we observe scatterplots with no correlation, we should be cautious about ruling out substantively meaningful positive and negative effects.
R code here.