International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3139-3143 © Research India Publications. http://www.ripublication.com 3139 Zero Inflated Binomial Model for Infant Mortality Data in Indonesia Wahyu Bodromurti 1,a) , Khairil Anwar Notodiputro 2,b) , and Anang Kurnia 3,c) 1,2,3 Department of Statistics, Bogor Agricultural University, Kampus IPB Darmaga, Bogor, Indonesia. Abstract This paper discusses overdispersed binomial models applied to infant mortality data in Indonesia. Overdispersion usually occurs when the data has many zeros, or called as excess zeros. In such cases, binomial models are less fit and the type I error can be inflated or higher false positive rates can be obtained. This problem can be resolved by using zero inflated binomial (ZIB) models. Hall (2000) applied ZIB models by modifying the zero inflated Poisson (ZIP) models developed by Lambert (1992). In the ZIB models, the response variable was assumed to be distributed as a mixture of non-zero value distribution consisted of binomial (n, π) and a distribution of the binary zero-indicator. It was also assumed that the mixing probability was p. The fitness of the model was assessed using ROC curves as well as other criteria such as AIC, AICC, and BIC. The result showed that ZIB model has better fit in terms of overcoming the overdipersed binomial data. Keywords: excess zeros, overdispersion, infant mortality, zero inflated binomial. INTRODUCTION Background Binary count data with success probability and upper bound usually follows binomial (, ) distribution and usually can be analyzed using binomial models. If the variation is greater than the assumed model then binomial data is called overdispersed (Hinde dan Demetrio 2007). The overdispersion can be caused by excess zeros. Hinde and Demetrio (2007) has claimed that the overdispersion may result in underestimated of standard error which produce underestimated p-values. This means that non-significant association will appear to be significant. Besides that, overdispersion can produces higher false positive rates that affect the validity of inferences. The zero inflated binomial (ZIB) can can be used to overcome the over-dispersion problems. Lambert (1992) was interested to adapt zero inflated Poisson regression (ZIP) models and Hall (2000) modified ZIP into ZIB models. Infant mortality is a binary event in certain period hence number of infant mortalities in each villages generally follows binomial distribution with probability of death among births. Through the Indonesia Demographic and Health Survey (IDHS) the infant mortality data was recorded in five years (2008-2012). Since the number of infant deaths is usually small then this data is very likely to suffer from over- dispersion problem. According to the note by World Health Organization (WHO) in 2015, there were 75% of under-five year deaths occur in the first year of life or around 4.5 billion babies. Based on Indonesia United Nations Children’s Emergency Fund (UNICEF Indonesia 2012), the patterns of high infant mortality rate are related to the babies from rural households, babies of mothers who are less educated, delivery place at home or delivery post instead of in health facilities, low birth weight babies (LBWB), birth order 4th until 6th, maternal age at delivery are more than 30 years old, babies are not breastfed or breastfeeding less than one year, and twice birth during last three years. This paper discusses ZIB models for infant mortality data in Indonesia. The response variable is number of infant deaths in each village and the explanatory variables are factors determining the patterns of infant mortality. The fitness of the ZIB model is assessed using Receiver Operating Characteristic (ROC) curves as well as other criteria such as Akaike’s Information Criterion (AIC), Akaike’s Information Criterion Corrected (AICC), and Bayesian Information Criterion (BIC). OBJECTIVES The research objectives are: 1. To understand the application of ZIB models in analyzing infant mortality data related to the death which occurs in the first year among infants in West Java, Indonesia. 2. To investigate the performance of ZIB model and to assess the model using ROC curves as well as other criteria such as AIC, AICC, and BIC. THEORITICAL REVIEW Zero Inflated Binomial Model Overdispersed in GLMs may be due to variability of experimental materials, correlation between individual responses, cluster sampling, aggregation level of data, and omitted unobserved variables (Hinde and Demetrio 2007). In some conditions, the cause of the overdispersion may be recognize from the nature of the data, such as excess zeroes which lead to greater variances than the assumed model. The overdispersed binomial data can be better fit using ZIB models. In ZIB models, the response variable is assumed to be distributed as a mixture of non-zero values distribution as binomial (, ) and a distribution of the binary zero-indicator, with mixing probability p. Overdispersed binomial data are modeled in ZIB models by Hall (2000), which is the response