International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2016): 79.57 | Impact Factor (2015): 6.391 Volume 7 Issue 2, February 2018 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Modelling the US Diabetes Mortality Rates via Generalized Linear Model with the Tweedie Distribution Oznur Ozaltin 1 , Neslihan Iyit 2, 3 1 Ataturk University, Faculty of Science, Department of Mathematics, Erzurum, Turkey 2 Selcuk University, Faculty of Science, Department of Statistics, Alaeddin Keykubat Campus, Konya, Turkey 3 Corresponding Author E-mail: niyit@selcuk.edu.tr Abstract: In this study, we are interested in modelling the response variable as the US diabetes mortality rate in the aspect of different types of neoplasms, endocrine, nutritional and metabolic diseases, musculoskeletal system diseases, obesity, sugar intake, and alcohol use disorder via generalized linear model (GLM) with the Tweedie distribution. In this study, firstly, we will focus on the effects of changing the variance power parameter and the index of the power link function on the AIC goodness-of-fit test statistic and also Pearson chi-square and deviance statistics for the dispersion parameter and the residuals in the GLMs with the Tweedie distribution for the US diabetes mortality data. The best link function is determined as “identity” with the variance power parameter “1.9” and the link function power “1” belonging to the Tweedie distribution in the GLM for the US diabetes mortality data. Secondly, the importance of model diagnostic plots based on the residuals, Cook’s distance and leverage is emphasized to determine the extreme observations that may cause some problems for parameter estimations, hypothesis tests, and statistical inferences in the GLM for the US diabetes mortality data from the Tweedie distribution. Keywords: Diabetes mortality, generalized linear model, Tweedie distribution, link function 1. Introduction Diabetes is a chronic disease that occurs either when the pancreas does not produce enough insulin or when the body cannot effectively use the insulin it produces. World Health Organization (WHO) projects that diabetes will be the seventh leading cause of death in 2030[44]. In many studies, diabetes mortality has been investigated in different aspects to enable a better understanding of the risk factors affecting it. Especially diabetes mortality is one of the most important causes of early mortality in the US than many European countries [37].In the literature, Goodkin (1975), Fuller et al. (1983), Williamson et al. (2000), Cifuentes et al. (2000), Kaati et al. (2002), Wen et al. (2005), Franco et al. (2007) and Secrest et al. (2014) investigated diabetes mortality risk factors in different aspects. A large number of studies on diabetes mortality have been included in the literature. In recent years, generalized linear models (GLMs) including regression models based on the exponential family of distributions with the flexibility of modelling the probability distribution of the continuous and discrete type response variables have gained popularity. In the literature, Nelder and Wedderburn (1972), Cameron and Trivedi (1986), McCullagh and Nelder (1989), Firth (1991), Liao (1994), Blough et al. (1999), Lindsey (2000), Diggle (2002), Renshaw and Haberman (2003), Dobson and Barnett (2008), Fitzmaurice et al. (2012), Grover et al. (2013), Agresti(2015) and Iyit et al. (2016) investigated GLMs approach in details. GLM with the response variable coming from the Tweedie distribution as a member of the class of mixed distributions known as the Tweedie family has attracted great interest in statistical modelling especially in actuarial sciences and risk modelling. Jorgensen and Paes De Souza (1994), Dunn and Smyth (2001), Smyth and Jorgensen (2002), Wuthrich (2003), Candy (2004), Dunn (2004), Dunn and Smyth (2005), Kaas (2005), Dunn and Smyth (2008), Shono (2008), Brown and Dunn (2011), Zhang (2013) and Simsekli et al. (2015) are good and exciting references for the GLM with the Tweedie distribution in the literature. In this study, we will focus on statistical modelling of the response variable as the US diabetes mortality rate in the aspect of different types of neoplasms, endocrine, nutritional and metabolic diseases, musculoskeletal system diseases, obesity, sugar intake, and alcohol use disorder via GLM with the Tweedie distribution. With the feature of the attractiveness of the Tweedie distribution still not well- known and not widely used, this study has not been done before for the diabetes mortality data in the literature. 2. Materials and Method Generalized linear models (GLMs) include regression models based on the exponential family of distributions. In GLMs, the distribution of the response variable comes from the exponential family. A monotonic and differentiable link function as g exists that linearizes the relationship between the linear predictor X  and the mean of the response variable EY  [21]. The Tweedie [41] distribution is useful to model a response variable y 0 . The response variable having the Tweedie distribution belonging to the exponential family can be modelled by using GLMs approach. The variance of the Tweedie distributed response variable is; () p VY  (1) Paper ID: ART2018368 DOI: 10.21275/ART2018368 1326