International Journal of Computer Applications (0975 – 8887) Volume 177 – No. 37, February 2020 44 The Implementation of Classification Algorithm C4.5 in Determining the Illness Risk Level for Health Insurance Company in Indonesia Apriyudha Angkasa P. Universitas Mercu Buana Devi Fitrianah Universitas Mercu Buana ABSTRACT Fundamental thing on health insurance is how to manage all contributions fee from membership insurance, so it can use for finance health services. In this writer’s case, the problem of health insurance is when registered membership insurance, there's no validation or adjustment about fee insurance with a history of illness from the applicant. That thing will be increasing financial cost if insurance does not use another approach from health services like promotive and preventive services for manage illness registered membership for health insurance, so that can be suppress financing of health services. Based on data on health insurance, they can do classification processing data and combined with algorithm C 4.5 for proses classification. Classification that has been used for mapping the level of risk illness membership in health insurance. Result from this research using a ten-fold cross-validation / confusion matrix with accuracy 99,87%. Keywords Algorithm C 4.5, Classification, Ten Fold Cross Validation, Confusion Matrix, The risk Illness. 1. INTRODUCTION Health insurance is insurance which provides social guarantee for people and created by the government, manage relationship between insurance and all people in country. Based on social insurance, all membership from insurance at the beginning of the registration process, participants were not checked about their health from candidates membership insurance. Adverse selection occurs when the policyholder is better able to anticipate expenses than the insurer [1], Selection (adverse or advantageous) is the central problem that inhibits the smooth, efficient functioning of competitive health insurance markets [2]. Social insurance always have problem about membership who registered and have illness before, that things will be increasing financial cost if insurance not use another approach from health services like promotive, preventive and rehabilitative services for manage illness. so far on proses to do another approach like promotive and preventive doing by department healthy like sharing to membership and collaboration with another public government like Puskesmas and Hospital. The Regulation of the Minister of Health Number 75 of 2014 concerning the Community Health Centredefines Community Health Centre (Puskesmas) as a health care facility that performs public health efforts and primary individual health efforts, with a priority on promotive and preventive efforts [3]. Primary health care is a suitable setting for interventions to identify and reduce behavioural risks factors and recommend preventive activities (including immunization, screening for cardiovascular risk factors and cancer, and counselling) [4]. In every socialization are members of participants good healthy participants, and participants who had a history of illness (participants hospital) who get health checkups and questioning about the health of participants. Further analysis of participants who had a history of certain illness is not optimal do and it is making the costs for health being higher Based on the issue, Health Insurance need to analysis to provide solutions to problems in analysis by determining the level of risk participants used the method, data-driven decision making to analyzed further data is based on participants who have registered. Data mining algorithms used for classified data is the algorithm c4.5. Information system and information technology are becoming critical role players for any organization to achieve its goals and become a winner in this globalization and competition era [5]. Algorithms c4.5 much used and implemented in a, decision making an excess of the algorithm c4.5 is able to process numerical data and discreet, can handle the value of attributes that empty, make the rules being easy to implementation and performance is one of the fastest compared to other algorithms [6], Can predict the class of objects of unknown [7]. The algorithm has several approaches, classification like based, decision trees induction, rule-based artificial neural network, genetic algorithms and the network Bayesian [8]. The algorithm c4.5 is most popular algorithms to classification in machine learning and data processing [9]. In the algorithms c4.5 is going on the process of classified data in the form of a different level, starting from the root to, leaves the process on a tree a decision was to turn the data format (table) become a model tree, changing models trees into rule and simplify rule [10]. 2. LITERATURE REVIEW Classification is one of the data mining techniques that is mainly used to analyze a given dataset and takes each instance of it and assigns this instance to a particular class such that classification error will be least [11]. C4.5 algorithm is one classification used algorithms and produced a decision on the developed by ross Quinlan in 1979 [12]. The basis of the c4.5 algorithm is making the decision, where it based on the election attribute that has prioritized commonly called with the highest gain based on the entropy those attributes as a pivot from the classification [13]. The form of the decision would form a condition in the form of if-then. c4.5 algorithms have basic work, of making the decision and the rules model [14]. Making tree decisions on algorithm c4.5 have step - step as follows : a. Make the selection of attributes that referred to as the root is of the value of gain is the highest of attributes - attribute of being there