Skip to contents

Bell and Miller (2015) data set used in Rainey (2016) and Rainey (2023) to illustrate logistic regression models with separation. These are the data to reproduce the "Firth logit" results with "Kargil excluded" throughout Bell and Miller (2015). See the "Firth logit" column of Table B of their online appendix.

Usage

bm2015

Format

A data frame with 455,619 observations and 14 variables:

year

the year of the observation (i.e., dyad-year)

statea

the numeric COW code of the first state of the dyad

stateb

the numeric COW code of the second state of the dyad

warl2

whether a war occurred

onenukedyad

whether exactly one state has nuclear weapons

twonukedyad

whether both states have nuclear weapons

logCapabilityRatio

the natural log of the capability ratio, which measures the distribution of power between the two states

Ally

whether the two states have a formal alliance or nonaggression pact

SmlDemocracy

the smaller of the two Polity scores in the dyad

SmlDependence

measures economic interdependence; the smaller of each country's imports and exports with its partner divided by their GDP

logDistance

the natural log of the distance between capitals of two states, or for large states, the distance between nearest ports

Contiguity

whether the two states are continguous

MajorPower

whether at least one of the states is a major power

NIGOs

measures shared membership in intergovernmental organizations

For further details, see Rauchhaus (2009, pp. 266-268) and Pevehouse and Russett (2006, pp. 980-983).

References

Bell, Mark S., and Nicholas L. Miller. 2013. "Questioning the Effect of Nuclear Weapons on Conflict." Journal of Conflict Resolution 59(1): 74–92. doi:10.1177/0022002713499718 .

Pevehouse, Jon, and Bruce Russett. 2006. "Democratic International Governmental Organizations Promote Peace." International Organization 60(4): 969-1000. doi:10.1017/S0020818306060322 .

Rainey, Carlisle. 2016. "Dealing with Separation in Logistic Regression Models." Political Analysis 24(3): 339-355. doi:10.1093/pan/mpw014 .

Rainey, Carlisle. 2023. "Hypothesis Tests Under Separation." Forthcoming in Political Analysis. doi:10.31235/osf.io/bmvnu .

Rainey, Carlisle. 2016. "priors-for-separation.zip" Replication Data for: Dealing with Separation in Logistic Regression Models. Harvard Dataverse, V1. doi:10.7910/DVN/VW7G2Q/MTJB9H .

Rauchhaus, Robert. 2009. "Evaluating the Nuclear Peace Hypothesis." Journal of Conflict Resolution 53(2): 258–77. doi:10.1177/0022002708330387 .

Examples


# a simple example

bm <- crdata::bm2015

# formula to reproduce "Firth logit" column of Table B of Bell and Miller's online appendix.
f <- warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio +
  Ally + SmlDemocracy + SmlDependence + logDistance +
  Contiguity + MajorPower + NIGOs

# twonukedyad == 1 perfectly predicts warl2 == 0
table("warl2" = bm$warl2, "twonukedyad" = bm$twonukedyad)
#>      twonukedyad
#> warl2      0      1
#>     0 454752    805
#>     1     62      0

# logit model with separation
fit <- glm(f, data = bm, family = binomial)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

# Rainey (2023) shows that Wald tests can never reject null when variables
#   create separation
summary(fit)
#> 
#> Call:
#> glm(formula = f, family = binomial, data = bm)
#> 
#> Coefficients:
#>                      Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)          -3.84661    1.12637  -3.415 0.000638 ***
#> onenukedyad           0.91097    0.36852   2.472 0.013437 *  
#> twonukedyad         -13.28422  522.98501  -0.025 0.979735    
#> logCapabilityRatio   -0.64855    0.12393  -5.233 1.66e-07 ***
#> Ally                 -0.44117    0.35226  -1.252 0.210417    
#> SmlDemocracy         -0.07472    0.03182  -2.348 0.018860 *  
#> SmlDependence      -119.85637   48.96674  -2.448 0.014377 *  
#> logDistance          -0.68642    0.13235  -5.186 2.15e-07 ***
#> Contiguity            2.94974    0.38336   7.694 1.42e-14 ***
#> MajorPower            2.35680    0.39472   5.971 2.36e-09 ***
#> NIGOs                -0.02978    0.01181  -2.521 0.011706 *  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 1227.87  on 455618  degrees of freedom
#> Residual deviance:  909.72  on 455608  degrees of freedom
#> AIC: 931.72
#> 
#> Number of Fisher Scoring iterations: 19
#> 

# Rainey (2023) shows that the LR test works fine when variables create
#   separation, though.
fit0 <- update(fit, . ~ . - twonukedyad)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
anova(fit0, fit, test = "Chisq")
#> Analysis of Deviance Table
#> 
#> Model 1: warl2 ~ onenukedyad + logCapabilityRatio + Ally + SmlDemocracy + 
#>     SmlDependence + logDistance + Contiguity + MajorPower + NIGOs
#> Model 2: warl2 ~ onenukedyad + twonukedyad + logCapabilityRatio + Ally + 
#>     SmlDemocracy + SmlDependence + logDistance + Contiguity + 
#>     MajorPower + NIGOs
#>   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
#> 1    455609     911.20                     
#> 2    455608     909.72  1    1.475   0.2246