Modeling the Pollution-Mortality Relationship Using Ridge Estimation and LASSO Brayan Ortiz California State University, Fullerton Department of Mathematics Abstract The effects of pollution on the mortality rate of humans command the attention of many scientists and analysts around the world. A plethora of different statistical techniques have been employed in the prospects of modeling. One particular approach involved the use of ridge regression, since a multicollinearity problem was evident amongst the variables. Selection of a model was based upon using Mallow’s Cp. I will reselect a model using ridge regression by using the Generalized Cross-Validation criterion, which is a more effective than Mallows Cp. Following the ridge regression, a Least Absolute Shrinkage and Selection Operator (LASSO) with a L1 penalty will be used to demonstrate a fast selection process that lends itself readily to cross-validation. The use of another possible information variable is suggested and alternative methods for future analysis are proposed. Introduction Investigations into the adverse effects of pollution, or particulate matter, are extensive. The importance of the research as felt by the scientific community can be evidenced by the fact that so many places around the world are conducting the same, or highly similar, study, which is to examine the effects of pollution on human mortality. In China, Ou et al. (2012) examined how dietary habits alongside air pollution affect mortality, where it was shown that dietary habits could alter pollutions’ effect on a person. Continuing the trend of accounting for confounding factors, Cesaroni et al. (2012) considered the adverse health effects from traffic-related and non- traffic-related pollution; the study was conducted in Rome, Italy. Nearby in Western Europe, Scheers et al. (2011) conducted a study that considered infant mortality (instead of the typical assessment of adult mortality), and concluded that higher levels of particulate matter increased that risk. To impress the importance of these kinds of studies, consider Hales et al. (2010) examination of the entire New Zealand population, where the objective was to assess the sensitivity to pollution, since urbanization is not as rampant in New Zealand as in the aforementioned countries. As varied as the locations of these pollution studies, so are the methods being employed to analyze the data sets. Logistic regression has been used to assess the probability of mortality 1