Firth’s Logit: Some References

logistic regression
small samples
In this note, I collect some older and more recent work on Firth’s penalized maximum likelihood estimator.

Carlisle Rainey


August 30, 2023

In Rainey and McCaskey (2021), Kelly McCaskey and I offer a accessible and practical (re)introduction to Firth’s penalized maximum likelihood estimator that (1) corrects the small sample bias and (2) reduces the excessive variance of the usual maximum likelihood estimator.

Below, I bookmark other references that might be helpful.

I’m sure there are embarrassing omissions. If you see an omission, please let me know (self-promotion is encouraged, especially not-yet-published papers).

This Stack Exchange answer gives a brief, but careful explanation of Firth’s logit. If you’re looking for a quick explanation, start here.

The Two Main Papers

  1. Firth (1993) originally introduced the idea. Kelly and I draw mostly on this paper—it’s a wonderful paper.
  2. Kosmidis and Firth (2021) follow-up with additional theoretical results that are relevant for the estimator as used in practice since 1993. This happened to come out while our paper was working its way through the publication process. Most importantly, they discuss the shrinkage property of the estimator, which is what Kelly and I highlight as under-appreciated (and really important!).

From my perpective, these are the two main papers to refer to if you’re concerned about small sample bias in logistic regression models.


Beyond these two main papers, there have been a few extensions. Zietkiewicz and Kosmidis (2023) talk about Firth’s logit in very large data sets. Cook, Hays, and Franzese (2018) make a good argument for using Firth’s estimator in panel data sets with binary outcomes and fixed effects. Sterzinger and Kosmidis (2023) apply these ideas to mixed models (or random effects models). Šinkovec et al. (2021) compare Firth’s approach to ridge regression, and suggest that Firth’s is superior in small or sparse data sets. Puhr et al. (2017) study Firth’s logit in the context of rare events and propose FLIC and FLAC as alternatives.


  • Röver et al. (2022) offer an application of Firth’s logit to clinical trials.
  • Turner and Firth (2012) offer an application to Bradley-Terry models with the {BradleyTerry2} R package.

Separation and Finiteness

I learned about Firth’s estimator from Zorn (2005), who follows Heinze and Schemper (2002) in suggesting it as a solution to separation. According to David Firth in this blog post, this is the application that stimulated interest in the approach after it went relatively unnoticed for a few years. (Great post, I highly recommend reading it!) This application piqued my interest in Firth’s estimator. Briefly, I think Firth’s default penalty might not be substantively reasonable in a given application (Rainey 2016) (see also Beiser-McGrath (2020)) and the usual likelihood ratio and score tests work well without the penalty (Rainey 2023).

For more on Firth’s logit, see Ioannis Kosmidis’ research page and Georg Heinz Google Scholar page.


Beiser-McGrath, Liam F. 2020. “Separation and Rare Events.” Political Science Research and Methods 10 (2): 428–37.
Cook, Scott J., Jude C. Hays, and Robert J. Franzese. 2018. “Fixed Effects in Rare Events Data: A Penalized Maximum Likelihood Solution.” Political Science Research and Methods 8 (1): 92–105.
Firth, David. 1993. “Bias Reduction of Maximum Likelihood Estimates.” Biometrika 80 (1): 27–38.
Heinze, Georg, and Michael Schemper. 2002. “A Solution to the Problem of Separation in Logistic Regression.” Statistics in Medicine 21 (16): 2409–19.
Kosmidis, Ioannis, and David Firth. 2021. “Jeffreys-Prior Penalty, Finiteness and Shrinkage in Binomial-Response Generalized Linear Models.” Biometrika 108 (1): 71–82.
Puhr, Rainer, Georg Heinze, Mariana Nold, Lara Lusa, and Angelika Geroldinger. 2017. “Firth’s Logistic Regression with Rare Events: Accurate Effect Estimates and Predictions?” Statistics in Medicine.
Rainey, Carlisle. 2016. “Dealing with Separation in Logistic Regression Models.” Political Analysis 24 (3): 339–55.
———. 2023. “Hypothesis Tests Under Separation.”
Rainey, Carlisle, and Kelly McCaskey. 2021. “Estimating Logit Models with Small Samples.” Political Science Research and Methods 9 (3): 549–64.
Röver, Christian, Moreno Ursino, Tim Friede, and Sarah Zohar. 2022. “A Straightforward Meta-Analysis Approach for Oncology Phase I Dose-Finding Studies.” Statistics in Medicine 41 (20): 3915–40.
Šinkovec, Hana, Georg Heinze, Rok Blagus, and Angelika Geroldinger. 2021. “To Tune or Not to Tune, a Case Study of Ridge Logistic Regression in Small or Sparse Datasets.” BMC Medical Research Methodology 21 (1).
Sterzinger, Philipp, and Ioannis Kosmidis. 2023. “Maximum Softly-Penalized Likelihood for Mixed Effects Logistic Regression.” Statistics and Computing 33 (2).
Turner, Heather, and David Firth. 2012. “Bradley-Terry Models inR: TheBradleyTerry2Package.” Journal of Statistical Software 48 (9).
Zietkiewicz, Patrick, and Ioannis Kosmidis. 2023. “Bounded-Memory Adjusted Scores Estimation in Generalized Linear Models with Large Data Sets.”
Zorn, Christopher. 2005. “A Solution to Separation in Binary Response Models.” Political Analysis 13 (2): 157–70.