2018 Proceedings of the Conference on Information Systems Applied Research ISSN: 2167-1508 Norfolk, Virginia v11 n 4813 ©2018 ISCAP (Information Systems & Computing Academic Professionals) Page 1 http://iscap.info Effects of Normalization Techniques on Logistic Regression in Data Science Adekunle Adeyemo Hayden Wimmer hayden.himmer@gmail.com Georgia Southern University Statesboro, GA, 30458 Loreen Powell lpowell@bloomu.edu Bloomsburg University Bloomsburg, PA 17815 Abstract The improvements in the data science profession have allowed the introduction of several mathematical ideas to social patterns of data. This research seeks to investigate how different normalization techniques can affect the performance of logistic regression. The original dataset was modeled using the SQL Server Analysis Services (SSAS) Logistic Regression model. This became the baseline model for the research. The normalization methods used to transform the original dataset were described. Next, different logistic models were built based on the three normalization techniques discussed. This work found that, in terms of accuracy, decimal scaling marginally outperformed min- max and z-score scaling. But when Lift was used to evaluate the performances of the models built, decimal scaling and z-score slightly performed better than min-max method. Future work is recommended to test the regression model on other datasets specifically those whose dependent variable are a 2-category problem or those with varying magnitude independent attributes. Keywords: Normalization, Logistic Regression, Z-Score, Min-Max, Decimal Scaling 1. INTRODUCTION Advancements in the field of data science have allowed the application of several mathematical concepts to behavioral patterns of data. Precisely, different normalization techniques have been applied to numerous datasets to solve problems from all walks of life. Data normalization is a preprocessing method used in different data mining systems, particularly, for classifying algorithms such as neural networks, clustering and neighbor classification (Evans, 2016). A lot of works have been published in data normalization and its application to different fields of human endeavors; Statistical Normalization and back Propagation for Classification, Min-Max Normalization based on Data Perturbation method for Privacy Protection, Importance of Data Normalization for the application of Neural Networks to Complex Industrial Problems and the Impact of Normalization Methods on RNA-Seq Data Analysis. In this research, we investigated how different normalization techniques affect the Performance of a Logistic Regression Classifier. Logistic regression is an ideal tool for answering