Meryem Saidi High School of Management GBM Laboratory, Tlemcen University miryem.saidi@gmail.com Mostafa El Habib Daho Biomedical Engineering Laboratory Tlemcen University mostafa.elhabibdaho@gmail.com Nesma Settouti Biomedical Engineering Laboratory Tlemcen University nesma.settouti@gmail.com Mohammed El Amine Bechar Biomedical Engineering Laboratory Tlemcen University am.bechar@gmail.com Abstract In recent years, the increase in the demand for credit leads the ﬁnancial institutions to con- sider artiﬁcial intelligence and machine learn- ing techniques as a solution to make decisions in a reduced time. These decision support sys- tems reach good results in classifying loan ap- plications into good loans and bad loans. Al- beit they suﬀer of some limitations, mainly, they consider that the misclassiﬁcation errors have the same ﬁnancial impact. In this work, we study the performance of en- semble cost sensitive algorithms in reducing the most expensive errors. We apply these techniques on German credit data. By com- paring the diﬀerent algorithms, we demon- strate the eﬀectiveness of cost sensitive ensem- ble algorithms in determining the potential loan defaulters to reduce the ﬁnancial cost. Keywords Cost sensitive learning, credit scoring, en- semble algorithms. Copyright c  by the paper’s authors. Copying permitted for private and academic purposes. In: Proceedings of the 3rd Edition of the International Conference on Advanced Aspects of Software Engineering (ICAASE18), Constantine, Algeria, 1,2-December-2018, pub- lished at http://ceur-ws.org 1 Introduction Credit scoring is the process of analyzing credit ﬁles, to decide the creditworthiness of an individual. Dis- tinguishing a good applicant for a loan from a bad one is important to cut ﬁnancial institution’s losses [AEW13]. The use of machine learning tools allows auditors to analyze large amounts of information for evaluating the credit risk in a reasonable time [Yu17]. These algorithms tend to decrease the classiﬁcation error and assume that all misclassiﬁcation’s have the same cost. However, the cost for labeling a positive example as negative is diﬀerent from the cost for label- ing a negative example as positive. Indeed, approving a bad loan is much more costly than rejecting a po- tentially good loan [KBC16]. Indeed, if a loan can not full ﬁll its loan obligations this may result in negative impacts on bank proﬁts and big ﬁnancial losses. How- ever, if a good loan is rejected, it causes lower proﬁts losses. These algorithms tend to decrease the classiﬁcation error and assume that all misclassiﬁcation’s have the same cost. However, the cost for labeling a positive example as negative is diﬀerent from the cost for label- ing a negative example as positive. Indeed, approving a bad loan is much more costly than rejecting a po- tentially good loan [KBC16]. Indeed, if a loan can not full ﬁll its loan obligations this may result in negative impacts on bank proﬁts and big ﬁnancial losses. How- ever, if a good loan is rejected, it causes lower proﬁts losses. On the other hand, credit datasets are highly im- balanced which worsens the situation. Traditional ma- chine learning algorithms tend to maximize accuracy Page 56 Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction