Logistic Regression with a Small Sample
Source:vignettes/articles/weisiger2014.Rmd
weisiger2014.Rmd
First, load the weisiger2014
data set.
library(tidyverse)
# load weisiger2014 data
weis <- crdata::weisiger2014
# show weisiger2014 data set
glimpse(weis)
#> Rows: 35
#> Columns: 7
#> $ resist <dbl> 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0…
#> $ polity_conq <dbl> -6, -5, 3, -6, -6, -9, -9, -5, -9, -4, -9, -9, -4, -9, -7,…
#> $ lndist <dbl> 9.127671, 5.483439, 7.811700, 7.305770, 7.584401, 6.358510…
#> $ terrain <dbl> 31.2, 22.2, 47.6, 1.1, 28.4, 0.0, 0.0, 10.5, 10.5, 21.6, 2…
#> $ soldperterr <dbl> -3.97814370, -3.10807820, -3.89651870, -1.29540790, -4.695…
#> $ gdppc2 <dbl> 685.9000, 1457.0000, 912.1334, 676.0000, 681.7000, 4831.00…
#> $ coord <dbl> 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1…
To facilitate interpretation and comparability, rescale the
explanatory variables using arm::rescale()
(i.e., so that
continuous explanatory variables have mean zero and SD 0.5 and the
binary explanatory variables to have mean zero.)
# rescale weisiger2014 explanatory variables using arm::rescale()
rs_weis <- weis %>%
mutate(across(polity_conq:coord, arm::rescale)) %>%
glimpse()
#> Rows: 35
#> Columns: 7
#> $ resist <dbl> 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0…
#> $ polity_conq <dbl> -0.18230763, -0.10050293, 0.55393473, -0.18230763, -0.1823…
#> $ lndist <dbl> 0.934818228, -0.938114356, 0.258482472, -0.001537566, 0.14…
#> $ terrain <dbl> 0.32513516, 0.09430898, 0.74575175, -0.44685017, 0.2533225…
#> $ soldperterr <dbl> -0.641316029, -0.467859669, -0.625043259, -0.106485507, -0…
#> $ gdppc2 <dbl> -0.354801313, -0.127853930, -0.288217123, -0.357715045, -0…
#> $ coord <dbl> 0.7428571, -0.2571429, 0.7428571, 0.7428571, -0.2571429, -…
Now we fit three models.
- Least squares, which replicates Model 3 of Table 2 of Weisiger (2014).
- A logit alternative, fit with maximum likelihood.
- A logit alternative, fit with penalized maximum likelihood using the {brglm2} package.
# fit models
f <- resist ~ polity_conq + lndist + terrain + soldperterr + gdppc2 + coord
ls <- lm(f, data = rs_weis) # lpm (replicates Table 2 Model 3 of Weisiger 2014)
mle <- glm(f, data = rs_weis, family = "binomial") # logistic regression
pmle <- glm(f, data = rs_weis, family = "binomial", method = brglm2::brglmFit) # logistic regression
To interpret the results, we might focus on Weisiger’s Hypothesis 12.
Hypothesis: Resistance will be more likely when the pre-war political leader remains at large in the conquered country.
The table below compares the estimates.
modelsummary::modelsummary(list("Least Squares" = ls,
"Logit via MLE" = mle,
"Logit via PMLE" = pmle))
Least Squares | Logit via MLE | Logit via PMLE | |
---|---|---|---|
(Intercept) | 0.400 | −0.731 | −0.477 |
(0.058) | (0.832) | (0.523) | |
polity_conq | −0.275 | −4.208 | −2.277 |
(0.165) | (2.689) | (1.657) | |
lndist | 0.473 | 6.886 | 3.402 |
(0.161) | (3.666) | (1.828) | |
terrain | 0.202 | 1.320 | 1.102 |
(0.148) | (2.323) | (1.358) | |
soldperterr | −0.128 | −0.302 | −0.595 |
(0.171) | (1.997) | (1.419) | |
gdppc2 | −0.129 | −3.275 | −1.154 |
(0.154) | (3.553) | (1.833) | |
coord | 0.436 | 5.648 | 3.051 |
(0.152) | (3.087) | (1.548) | |
Num.Obs. | 35 | 35 | 35 |
R2 | 0.608 | ||
R2 Adj. | 0.524 | ||
AIC | 32.6 | 29.5 | 32.1 |
BIC | 45.0 | 40.4 | 43.0 |
Log.Lik. | −8.291 | −7.756 | −9.074 |
F | 7.244 | 1.210 | 1.576 |
RMSE | 0.31 | 0.27 | 0.27 |
The plot below compares the estimates. Notice that Firth’s PML estimator produces smaller estimates and CIs.