International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3139-3143
© Research India Publications. http://www.ripublication.com
3139
Zero Inflated Binomial Model for Infant Mortality Data in Indonesia
Wahyu Bodromurti
1,a)
, Khairil Anwar Notodiputro
2,b)
, and Anang Kurnia
3,c)
1,2,3
Department of Statistics, Bogor Agricultural University, Kampus IPB Darmaga, Bogor, Indonesia.
Abstract
This paper discusses overdispersed binomial models applied
to infant mortality data in Indonesia. Overdispersion usually
occurs when the data has many zeros, or called as excess
zeros. In such cases, binomial models are less fit and the type
I error can be inflated or higher false positive rates can be
obtained. This problem can be resolved by using zero inflated
binomial (ZIB) models. Hall (2000) applied ZIB models by
modifying the zero inflated Poisson (ZIP) models developed
by Lambert (1992). In the ZIB models, the response variable
was assumed to be distributed as a mixture of non-zero value
distribution consisted of binomial (n, π) and a distribution of
the binary zero-indicator. It was also assumed that the mixing
probability was p. The fitness of the model was assessed using
ROC curves as well as other criteria such as AIC, AICC, and
BIC. The result showed that ZIB model has better fit in terms
of overcoming the overdipersed binomial data.
Keywords: excess zeros, overdispersion, infant mortality,
zero inflated binomial.
INTRODUCTION
Background
Binary count data with success probability and upper bound
usually follows binomial (, ) distribution and usually can
be analyzed using binomial models. If the variation is greater
than the assumed model then binomial data is called
overdispersed (Hinde dan Demetrio 2007). The overdispersion
can be caused by excess zeros. Hinde and Demetrio (2007)
has claimed that the overdispersion may result in
underestimated of standard error which produce
underestimated p-values. This means that non-significant
association will appear to be significant. Besides that,
overdispersion can produces higher false positive rates that
affect the validity of inferences. The zero inflated binomial
(ZIB) can can be used to overcome the over-dispersion
problems. Lambert (1992) was interested to adapt zero
inflated Poisson regression (ZIP) models and Hall (2000)
modified ZIP into ZIB models.
Infant mortality is a binary event in certain period hence
number of infant mortalities in each villages generally follows
binomial distribution with probability of death among
births. Through the Indonesia Demographic and Health
Survey (IDHS) the infant mortality data was recorded in five
years (2008-2012). Since the number of infant deaths is
usually small then this data is very likely to suffer from over-
dispersion problem. According to the note by World Health
Organization (WHO) in 2015, there were 75% of under-five
year deaths occur in the first year of life or around 4.5 billion
babies. Based on Indonesia United Nations Children’s
Emergency Fund (UNICEF Indonesia 2012), the patterns of
high infant mortality rate are related to the babies from rural
households, babies of mothers who are less educated, delivery
place at home or delivery post instead of in health facilities,
low birth weight babies (LBWB), birth order 4th until 6th,
maternal age at delivery are more than 30 years old, babies are
not breastfed or breastfeeding less than one year, and twice
birth during last three years. This paper discusses ZIB models
for infant mortality data in Indonesia. The response variable is
number of infant deaths in each village and the explanatory
variables are factors determining the patterns of infant
mortality. The fitness of the ZIB model is assessed using
Receiver Operating Characteristic (ROC) curves as well as
other criteria such as Akaike’s Information Criterion (AIC),
Akaike’s Information Criterion Corrected (AICC), and
Bayesian Information Criterion (BIC).
OBJECTIVES
The research objectives are:
1. To understand the application of ZIB models in
analyzing infant mortality data related to the death
which occurs in the first year among infants in West
Java, Indonesia.
2. To investigate the performance of ZIB model and to
assess the model using ROC curves as well as other
criteria such as AIC, AICC, and BIC.
THEORITICAL REVIEW
Zero Inflated Binomial Model
Overdispersed in GLMs may be due to variability of
experimental materials, correlation between individual
responses, cluster sampling, aggregation level of data, and
omitted unobserved variables (Hinde and Demetrio 2007). In
some conditions, the cause of the overdispersion may be
recognize from the nature of the data, such as excess zeroes
which lead to greater variances than the assumed model. The
overdispersed binomial data can be better fit using ZIB
models. In ZIB models, the response variable is assumed to be
distributed as a mixture of non-zero values distribution as
binomial (, ) and a distribution of the binary zero-indicator,
with mixing probability p. Overdispersed binomial data are
modeled in ZIB models by Hall (2000), which is the response