Application of classiﬁcation techniques on development an early-warning system for chronic illnesses Chih-Hung Jen a , Chien-Chih Wang b,⇑ , Bernard C. Jiang c , Yan-Hua Chu b , Ming-Shu Chen d a Department of Information Management, Lunghwa University of Science and Technology, Taiwan b Department of Industrial Engineering and Management, Ming Chi University of Technology, Taiwan c Department of Industrial Engineering and Management, Yuan Ze University, Taiwan d Department of Health Management Center, Far Eastern Memorial Hospital, Taiwan article info Keywords: Risk factor K-nearest neighbor Linear discriminant analysis Sequential forward selection Early-warning criteria abstract Chronic disease has gradually become a major fatality cause in Taiwan. Being afﬂicted with such illnesses elevate vulnerability to other complications as well. Therefore, this paper adopts a preventative perspec- tive and ascertains the impacts of important physiological indicators and clinical test values for various chronic illnesses. The paper investigates ﬁve chronic diseases: hypertension, diabetes, cardiovascular dis- ease, disease of the liver, and renal disease. Utilizing chronic diseases risk factors to establish early-warn- ing criteria may reduce the complication occurrence rate. K-nearest neighbor, linear discriminant analysis, and sequential forward selection are utilized, which is divided into two parts. The ﬁrst part clas- siﬁes and screens both healthy persons and those afﬁliated with the abovementioned chronic illnesses for characteristic value determination. The second part determines the critical value of the important risk factors of each chronic illness and builds early-warning criteria to recognition the chronic illnesses. This paper uses data from a medical center in Taiwan to verify the proposed methodology. The results reveal that classifying materials and screening important factors are both positively efﬁcient with a corrected rate of 80%. Additionally, through the important factors of early-warning criteria, not only can help patients understand the risks of suffering diseases, but also effectively offer diagnosis reference criteria for medical personnel. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Recently, due to their being linked to obesity and lack of mobil- ity, chronic illnesses have gradually replaced infectious diseases as the primary death cause in Taiwan. These chronic illnesses include cardiovascular/cerebrovascular disease, diabetes mellitus, liver disease, renal disease, hypertension, and so on. These illnesses make gradual inroads into the body over time, often over years, and the discomfort-causing symptoms are often ignored. Thus, when a symptom of the disease becomes apparent, the body’s physical condition has already been seriously impaired. Then, the related fee for taking care of the patient and the cost of medical treatment will be a heavy burden—and heavier than it would have been with earlier detection of the disease. Moreover, if the patients have suffered from the chronic illnesses, but have not sought active treatment, additional complications associated with these diseases may arise. For example, hypertension generally accompanies car- diovascular disease, diabetes mellitus, disease of the liver, and re- nal disease (Burtis, Ashwood, & Bruns, 2008; Czyzyk & Szczepanik, 2000; Lewis, 2002; Li, Hasimu, Yu, Wang, & Hu, 2006; Lim, Kim, Choe, Ki, & Park, 2006; Su, Yang, Hsu, & Chiu, 2006; Tapio, Ari, Pirjo, Martina, & Katarina, 2005). Therefore, high blood pressure is an important clinical indicator and symptom of the principle underly- ing disease. This paper will focus on one of these chronic illnesses, hypertensions, which is also a frequent complication associated with four other chronic illnesses, those being cardiovascular dis- ease, diabetes mellitus, liver disease, and renal disease. Several studies have explored to apply the data mining tech- niques to analyze medical data. Joe, Andrew, Lynn, Assiamira, and Jennifer (2001) had used logistic regression to construct the multivariate forecast model for classifying the population at risk for diabetes complications. Using the testing results of the forecast model proved to be a method superior to using only glycosylated Hb (HbA1) to predict the generation of complications. Bahman and Willam (2003) also applied logistic regression to establish a convenient forecasting model for screening diabetics. This model was constructed using 1032 cases. Related data including age, gen- der, BMI, fasting time, and blood glucose level were collected. Kim, Sohn, and Yoon (2003) utilized three kinds of growth curves for predicting liver disease. In addition, this paper also executed a comparative analysis using logistic regression, the decision tree, and the neural networks. From the results of the literature, use 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.02.004 ⇑ Corresponding author. E-mail address: ieccwang@mail.mcut.edu.tw (C.-C. Wang). Expert Systems with Applications 39 (2012) 8852–8858 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa