Journal of Mathematics and Statistics 5 (4): 387-394, 2009
ISSN 1549-3644
© 2009 Science Publications
Corresponding Author: S.K. Sarkar, Laboratory of Applied and Computational Statistics, Institute for Mathematical Research,
University Putra Malaysia, 43400 Serdang, Selangor, Malaysia
387
Optimization Techniques for Variable Selection in Binary Logistic Regression Model
Applied to Desire for Children Data
S.K. Sarkar and Habshah Midi
Laboratory of Applied and Computational Statistics, Institute for Mathematical Research,
University Putra Malaysia, 43400 Serdang, Selangor, Malaysia
Abstract: Problem statement: The population problem is the biggest problem in the world. In the
global and regional context, Bangladesh population has drawn considerable attention of the social
scientists, policy makers and international organizations. Bangladesh is now world’s 10th populous
country having about 140 million people. The recent experience of Bangladesh shows that fertility can
sustain impressive declines even when women’s lives remain severely constrained. Recent statistics
also suggest that, despite a continuing increase in contraceptive prevalence rate (56%), the expected
fertility decline in Bangladesh has stalled. Approach: The purpose of this study was to explore the
possibility of further fertility decline in Bangladesh with special attention to identify some social and
demographic factors as predictors which are responsible to desire for more children using stepwise and
best subsets logistic regression approaches. The study had compared two approaches to determine an
optimum model for prediction of the outcome. Results: It had been found, excess desire for children is
solely responsible for the stalled fertility. Conclusion: To overcome the situation, the policy makers of
Bangladesh should pay their attention to eliminate the regional variations of desire for more children
and introduce awareness programs among rural women about the positive impact of smaller family.
Key words: Best subsets, stepwise logistic regression, design variables, Mallow’s C
p
, score test
INTRODUCTION
The population problem is the biggest problem in
the world today. It makes every other problem worse
and harder to solve. The world’s population is expected
to grow by another 2.3 billion, from 6.8 billion in 2009
to 9.1 billion in 2050. Most of this growth will take
place in the developing countries. In global and
regional context, Bangladesh population has drawn
considerable attention of the social scientists, policy
makers and international organizations. Bangladesh is
now world’s 10th populous country having about
140 million people. According to the United Nations
and other agencies, the population growth rate of
Bangladesh is still 1.65%. If this rate continues, the
population of Bangladesh will double in 2050. Unless
action is taken to accelerate the reductions in the rates
of growth, the population of the world will not stabilize
and certain region and countries like Bangladesh will
go far beyond the limits consistent with political
stability and acceptable social and economic conditions.
However, recent statistics suggest that, despite a
continuing increase in contraceptive prevalence rate
(55.8%), the fertility decline in Bangladesh has stalled.
The total fertility rate is still 3.1 and it is far beyond the
replacement level fertility rate 2.1. Further fertility
decline is required to achieve stable population in
Bangladesh
[14]
.
The purpose of this study is to explore the
possibility of further fertility decline in Bangladesh
with special attention to identify some crucial social
and demographic factors as predictors which are
responsible to desire for more children. The study
provides a simple explanation and demonstration of
how to obtain a best subsets solution in logistic
regression and interpret the results.
The criteria for including a variable in a model may
vary from one problem to the next and from one
scientific discipline to another. The traditional approach
to statistical model building involves seeking the most
parsimonious model that still explains the data. There
are several steps one can follow to aid in the selection
of variables for a logistic regression. The present study
will discuss stepwise and best subset logistic regression
for variable selection and compare them to determine a
parsimonious model. Variables must be selected
carefully so that the model makes accurate predictions,
but without over fitting the data. Selecting variables by