A data set on nuclear weapons and war; illustrates logistic regression with separation in a large sample
Source:R/bm2015.R
bm2015.Rd
Bell and Miller (2015) data set used in Rainey (2016) and Rainey (2023) to illustrate logistic regression models with separation. These are the data to reproduce the "Firth logit" results with "Kargil excluded" throughout Bell and Miller (2015). See the "Firth logit" column of Table B of their online appendix.
Format
A data frame with 455,619 observations and 14 variables:
year
the year of the observation (i.e., dyad-year)
statea
the numeric COW code of the first state of the dyad
stateb
the numeric COW code of the second state of the dyad
warl2
whether a war occurred
onenukedyad
whether exactly one state has nuclear weapons
twonukedyad
whether both states have nuclear weapons
logCapabilityRatio
the natural log of the capability ratio, which measures the distribution of power between the two states
Ally
whether the two states have a formal alliance or nonaggression pact
SmlDemocracy
the smaller of the two Polity scores in the dyad
SmlDependence
measures economic interdependence; the smaller of each country's imports and exports with its partner divided by their GDP
logDistance
the natural log of the distance between capitals of two states, or for large states, the distance between nearest ports
Contiguity
whether the two states are continguous
MajorPower
whether at least one of the states is a major power
NIGOs
measures shared membership in intergovernmental organizations
For further details, see Rauchhaus (2009, pp. 266-268) and Pevehouse and Russett (2006, pp. 980-983).
References
Bell, Mark S., and Nicholas L. Miller. 2013. "Questioning the Effect of Nuclear Weapons on Conflict." Journal of Conflict Resolution 59(1): 74–92. doi:10.1177/0022002713499718 .
Pevehouse, Jon, and Bruce Russett. 2006. "Democratic International Governmental Organizations Promote Peace." International Organization 60(4): 969-1000. doi:10.1017/S0020818306060322 .
Rainey, Carlisle. 2016. "Dealing with Separation in Logistic Regression Models." Political Analysis 24(3): 339-355. doi:10.1093/pan/mpw014 .
Rainey, Carlisle. 2023. "Hypothesis Tests Under Separation." Forthcoming in Political Analysis. doi:10.31235/osf.io/bmvnu .
Rainey, Carlisle. 2016. "priors-for-separation.zip" Replication Data for: Dealing with Separation in Logistic Regression Models. Harvard Dataverse, V1. doi:10.7910/DVN/VW7G2Q/MTJB9H .
Rauchhaus, Robert. 2009. "Evaluating the Nuclear Peace Hypothesis." Journal of Conflict Resolution 53(2): 258–77. doi:10.1177/0022002708330387 .
Examples
# a simple example
bm <- crdata::bm2015
# formula to reproduce "Firth logit" column of Table B of Bell and Miller's online appendix.
f <- warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio +
Ally + SmlDemocracy + SmlDependence + logDistance +
Contiguity + MajorPower + NIGOs
# twonukedyad == 1 perfectly predicts warl2 == 0
table("warl2" = bm$warl2, "twonukedyad" = bm$twonukedyad)
#> twonukedyad
#> warl2 0 1
#> 0 454752 805
#> 1 62 0
# logit model with separation
fit <- glm(f, data = bm, family = binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Rainey (2023) shows that Wald tests can never reject null when variables
# create separation
summary(fit)
#>
#> Call:
#> glm(formula = f, family = binomial, data = bm)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.84661 1.12637 -3.415 0.000638 ***
#> onenukedyad 0.91097 0.36852 2.472 0.013437 *
#> twonukedyad -13.28422 522.98501 -0.025 0.979735
#> logCapabilityRatio -0.64855 0.12393 -5.233 1.66e-07 ***
#> Ally -0.44117 0.35226 -1.252 0.210417
#> SmlDemocracy -0.07472 0.03182 -2.348 0.018860 *
#> SmlDependence -119.85637 48.96674 -2.448 0.014377 *
#> logDistance -0.68642 0.13235 -5.186 2.15e-07 ***
#> Contiguity 2.94974 0.38336 7.694 1.42e-14 ***
#> MajorPower 2.35680 0.39472 5.971 2.36e-09 ***
#> NIGOs -0.02978 0.01181 -2.521 0.011706 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 1227.87 on 455618 degrees of freedom
#> Residual deviance: 909.72 on 455608 degrees of freedom
#> AIC: 931.72
#>
#> Number of Fisher Scoring iterations: 19
#>
# Rainey (2023) shows that the LR test works fine when variables create
# separation, though.
fit0 <- update(fit, . ~ . - twonukedyad)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
anova(fit0, fit, test = "Chisq")
#> Analysis of Deviance Table
#>
#> Model 1: warl2 ~ onenukedyad + logCapabilityRatio + Ally + SmlDemocracy +
#> SmlDependence + logDistance + Contiguity + MajorPower + NIGOs
#> Model 2: warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio + Ally +
#> SmlDemocracy + SmlDependence + logDistance + Contiguity +
#> MajorPower + NIGOs
#> Resid. Df Resid. Dev Df Deviance Pr(>Chi)
#> 1 455609 911.20
#> 2 455608 909.72 1 1.475 0.2246