A data set on nuclear weapons and war; illustrates logistic regression with separation in a large sample

Bell and Miller (2015) data set used in Rainey (2016) and Rainey (2023) to illustrate logistic regression models with separation. These are the data to reproduce the "Firth logit" results with "Kargil excluded" throughout Bell and Miller (2015). See the "Firth logit" column of Table B of their online appendix.

Usage

bm2015

Format

A data frame with 455,619 observations and 14 variables:

year: the year of the observation (i.e., dyad-year)
statea: the numeric COW code of the first state of the dyad
stateb: the numeric COW code of the second state of the dyad
warl2: whether a war occurred
onenukedyad: whether exactly one state has nuclear weapons
twonukedyad: whether both states have nuclear weapons
logCapabilityRatio: the natural log of the capability ratio, which measures the distribution of power between the two states
Ally: whether the two states have a formal alliance or nonaggression pact
SmlDemocracy: the smaller of the two Polity scores in the dyad
SmlDependence: measures economic interdependence; the smaller of each country's imports and exports with its partner divided by their GDP
logDistance: the natural log of the distance between capitals of two states, or for large states, the distance between nearest ports
Contiguity: whether the two states are continguous
MajorPower: whether at least one of the states is a major power
NIGOs: measures shared membership in intergovernmental organizations

For further details, see Rauchhaus (2009, pp. 266-268) and Pevehouse and Russett (2006, pp. 980-983).

References

Bell, Mark S., and Nicholas L. Miller. 2013. "Questioning the Effect of Nuclear Weapons on Conflict." Journal of Conflict Resolution 59(1): 74–92. doi:10.1177/0022002713499718 .

Pevehouse, Jon, and Bruce Russett. 2006. "Democratic International Governmental Organizations Promote Peace." International Organization 60(4): 969-1000. doi:10.1017/S0020818306060322 .

Rainey, Carlisle. 2016. "Dealing with Separation in Logistic Regression Models." Political Analysis 24(3): 339-355. doi:10.1093/pan/mpw014 .

Rainey, Carlisle. 2023. "Hypothesis Tests Under Separation." Forthcoming in Political Analysis. doi:10.31235/osf.io/bmvnu .

Rainey, Carlisle. 2016. "priors-for-separation.zip" Replication Data for: Dealing with Separation in Logistic Regression Models. Harvard Dataverse, V1. doi:10.7910/DVN/VW7G2Q/MTJB9H .

Rauchhaus, Robert. 2009. "Evaluating the Nuclear Peace Hypothesis." Journal of Conflict Resolution 53(2): 258–77. doi:10.1177/0022002708330387 .

Examples


# a simple example

bm <- crdata::bm2015

# formula to reproduce "Firth logit" column of Table B of Bell and Miller's online appendix.
f <- warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio +
  Ally + SmlDemocracy + SmlDependence + logDistance +
  Contiguity + MajorPower + NIGOs

# twonukedyad == 1 perfectly predicts warl2 == 0
table("warl2" = bm$warl2, "twonukedyad" = bm$twonukedyad)
#>      twonukedyad
#> warl2      0      1
#>     0 454752    805
#>     1     62      0

# logit model with separation
fit <- glm(f, data = bm, family = binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

# Rainey (2023) shows that Wald tests can never reject null when variables
#   create separation
summary(fit)
#> 
#> Call:
#> glm(formula = f, family = binomial, data = bm)
#> 
#> Coefficients:
#>                      Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)          -3.84661    1.12637  -3.415 0.000638 ***
#> onenukedyad           0.91097    0.36852   2.472 0.013437 *  
#> twonukedyad         -13.28422  522.98501  -0.025 0.979735    
#> logCapabilityRatio   -0.64855    0.12393  -5.233 1.66e-07 ***
#> Ally                 -0.44117    0.35226  -1.252 0.210417    
#> SmlDemocracy         -0.07472    0.03182  -2.348 0.018860 *  
#> SmlDependence      -119.85637   48.96674  -2.448 0.014377 *  
#> logDistance          -0.68642    0.13235  -5.186 2.15e-07 ***
#> Contiguity            2.94974    0.38336   7.694 1.42e-14 ***
#> MajorPower            2.35680    0.39472   5.971 2.36e-09 ***
#> NIGOs                -0.02978    0.01181  -2.521 0.011706 *  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 1227.87  on 455618  degrees of freedom
#> Residual deviance:  909.72  on 455608  degrees of freedom
#> AIC: 931.72
#> 
#> Number of Fisher Scoring iterations: 19
#> 

# Rainey (2023) shows that the LR test works fine when variables create
#   separation, though.
fit0 <- update(fit, . ~ . - twonukedyad)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
anova(fit0, fit, test = "Chisq")
#> Analysis of Deviance Table
#> 
#> Model 1: warl2 ~ onenukedyad + logCapabilityRatio + Ally + SmlDemocracy + 
#>     SmlDependence + logDistance + Contiguity + MajorPower + NIGOs
#> Model 2: warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio + Ally + 
#>     SmlDemocracy + SmlDependence + logDistance + Contiguity + 
#>     MajorPower + NIGOs
#>   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
#> 1    455609     911.20                     
#> 2    455608     909.72  1    1.475   0.2246