Fixing the nonconvergence bug in logistic regression with SPLUS and SAS Georg Heinze *, Meinhard Ploner Department of Medical Computer Sciences, University of Vienna, Spitalgasse 23, A-1090 Vienna, Austria Received 5 June 2001; received in revised form 7 May 2002; accepted 21 May 2002 Abstract When analyzing clinical data with binary outcomes, the parameter estimates and consequently the odds ratio estimates of a logistic model sometimes do not converge to finite values. This phenomenon is due to special conditions in a data set and known as ‘separation’. Statistical software packages for logistic regression using the maximum likelihood method cannot appropriately deal with this problem. A new procedure to solve the problem has been proposed by Heinze and Schemper (Stat. Med. 21 (2002) pp. 2409 /3419). It has been shown that unlike the standard maximum likelihood method, this method always leads to finite parameter estimates. We developed a SAS macro and an SPLUS library to make this method available from within one of these widely used statistical software packages. Our programs are also capable of performing interval estimation based on profile penalized log likelihood (PPL) and of plotting the PPL function as was suggested by Heinze and Schemper (Stat. Med. 21 (2002) pp. 2409 /3419). # 2002 Elsevier Science Ireland Ltd. All rights reserved. Keywords: Monotone likelihood; Nonexistence of parameter estimates; Penalized likelihood; Separation 1. Introduction For analyzing clinical studies with binary out- comes, the logistic regression model [1,2] is often used. The straightforward interpretation of the estimated parameters as log odds ratios favored its popularity in medical research, and the capability of allowing models with more than one covariate enables estimation of odds ratios that are adjusted for other covariates [2]. Parameter estimation is usually based on maximization of the (log) like- lihood function (maximum likelihood method) via an iteratively weighted least-squares algorithm [3]. However, it is also known that there are certain situations particularly occurring in samples with a high number of parameters relative to sample size where finite maximum likelihood parameter esti- mates do not exist. In those cases the likelihood converges to a finite value while at least one parameter estimate diverges to 9/ [4]. This phenomenon is due to special conditions in a * Corresponding author. Tel.: /43-1-40400-6684; fax: /43- 1-40400-6687 E-mail address: georg.heinze@akh-wien.ac.at (G. Heinze). Computer Methods and Programs in Biomedicine 71 (2003) 181 /187 www.elsevier.com/locate/cmpb 0169-2607/02/$ - see front matter # 2002 Elsevier Science Ireland Ltd. All rights reserved. PII:S0169-2607(02)00088-3