A data set on nuclear weapons and war; illustrates logistic regression with separation in a large sample
Source:R/bm2015.R
bm2015.RdBell and Miller (2015) data set used in Rainey (2016) and Rainey (2023) to illustrate logistic regression models with separation. These are the data to reproduce the "Firth logit" results with "Kargil excluded" throughout Bell and Miller (2015). See the "Firth logit" column of Table B of their online appendix.
Format
A data frame with 455,619 observations and 14 variables:
yearthe year of the observation (i.e., dyad-year)
stateathe numeric COW code of the first state of the dyad
statebthe numeric COW code of the second state of the dyad
warl2whether a war occurred
onenukedyadwhether exactly one state has nuclear weapons
twonukedyadwhether both states have nuclear weapons
logCapabilityRatiothe natural log of the capability ratio, which measures the distribution of power between the two states
Allywhether the two states have a formal alliance or nonaggression pact
SmlDemocracythe smaller of the two Polity scores in the dyad
SmlDependencemeasures economic interdependence; the smaller of each country's imports and exports with its partner divided by their GDP
logDistancethe natural log of the distance between capitals of two states, or for large states, the distance between nearest ports
Contiguitywhether the two states are continguous
MajorPowerwhether at least one of the states is a major power
NIGOsmeasures shared membership in intergovernmental organizations
For further details, see Rauchhaus (2009, pp. 266-268) and Pevehouse and Russett (2006, pp. 980-983).
References
Bell, Mark S., and Nicholas L. Miller. 2013. "Questioning the Effect of Nuclear Weapons on Conflict." Journal of Conflict Resolution 59(1): 74–92. doi:10.1177/0022002713499718 .
Pevehouse, Jon, and Bruce Russett. 2006. "Democratic International Governmental Organizations Promote Peace." International Organization 60(4): 969-1000. doi:10.1017/S0020818306060322 .
Rainey, Carlisle. 2016. "Dealing with Separation in Logistic Regression Models." Political Analysis 24(3): 339-355. doi:10.1093/pan/mpw014 .
Rainey, Carlisle. 2023. "Hypothesis Tests Under Separation." Forthcoming in Political Analysis. doi:10.31235/osf.io/bmvnu .
Rainey, Carlisle. 2016. "priors-for-separation.zip" Replication Data for: Dealing with Separation in Logistic Regression Models. Harvard Dataverse, V1. doi:10.7910/DVN/VW7G2Q/MTJB9H .
Rauchhaus, Robert. 2009. "Evaluating the Nuclear Peace Hypothesis." Journal of Conflict Resolution 53(2): 258–77. doi:10.1177/0022002708330387 .
Examples
# a simple example
bm <- crdata::bm2015
# formula to reproduce "Firth logit" column of Table B of Bell and Miller's online appendix.
f <- warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio +
Ally + SmlDemocracy + SmlDependence + logDistance +
Contiguity + MajorPower + NIGOs
# twonukedyad == 1 perfectly predicts warl2 == 0
table("warl2" = bm$warl2, "twonukedyad" = bm$twonukedyad)
#> twonukedyad
#> warl2 0 1
#> 0 454752 805
#> 1 62 0
# logit model with separation
fit <- glm(f, data = bm, family = binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Rainey (2023) shows that Wald tests can never reject null when variables
# create separation
summary(fit)
#>
#> Call:
#> glm(formula = f, family = binomial, data = bm)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -3.84661 1.12637 -3.415 0.000638 ***
#> onenukedyad 0.91097 0.36852 2.472 0.013437 *
#> twonukedyad -13.28422 522.98501 -0.025 0.979735
#> logCapabilityRatio -0.64855 0.12393 -5.233 1.66e-07 ***
#> Ally -0.44117 0.35226 -1.252 0.210417
#> SmlDemocracy -0.07472 0.03182 -2.348 0.018860 *
#> SmlDependence -119.85637 48.96674 -2.448 0.014377 *
#> logDistance -0.68642 0.13235 -5.186 2.15e-07 ***
#> Contiguity 2.94974 0.38336 7.694 1.42e-14 ***
#> MajorPower 2.35680 0.39472 5.971 2.36e-09 ***
#> NIGOs -0.02978 0.01181 -2.521 0.011706 *
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 1227.87 on 455618 degrees of freedom
#> Residual deviance: 909.72 on 455608 degrees of freedom
#> AIC: 931.72
#>
#> Number of Fisher Scoring iterations: 19
#>
# Rainey (2023) shows that the LR test works fine when variables create
# separation, though.
fit0 <- update(fit, . ~ . - twonukedyad)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
anova(fit0, fit, test = "Chisq")
#> Analysis of Deviance Table
#>
#> Model 1: warl2 ~ onenukedyad + logCapabilityRatio + Ally + SmlDemocracy +
#> SmlDependence + logDistance + Contiguity + MajorPower + NIGOs
#> Model 2: warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio + Ally +
#> SmlDemocracy + SmlDependence + logDistance + Contiguity +
#> MajorPower + NIGOs
#> Resid. Df Resid. Dev Df Deviance Pr(>Chi)
#> 1 455609 911.20
#> 2 455608 909.72 1 1.475 0.2246