Meryem Saidi High School of Management GBM Laboratory, Tlemcen University miryem.saidi@gmail.com Mostafa El Habib Daho Biomedical Engineering Laboratory Tlemcen University mostafa.elhabibdaho@gmail.com Nesma Settouti Biomedical Engineering Laboratory Tlemcen University nesma.settouti@gmail.com Mohammed El Amine Bechar Biomedical Engineering Laboratory Tlemcen University am.bechar@gmail.com Abstract In recent years, the increase in the demand for credit leads the financial institutions to con- sider artificial intelligence and machine learn- ing techniques as a solution to make decisions in a reduced time. These decision support sys- tems reach good results in classifying loan ap- plications into good loans and bad loans. Al- beit they suffer of some limitations, mainly, they consider that the misclassification errors have the same financial impact. In this work, we study the performance of en- semble cost sensitive algorithms in reducing the most expensive errors. We apply these techniques on German credit data. By com- paring the different algorithms, we demon- strate the effectiveness of cost sensitive ensem- ble algorithms in determining the potential loan defaulters to reduce the financial cost. Keywords Cost sensitive learning, credit scoring, en- semble algorithms. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: Proceedings of the 3rd Edition of the International Conference on Advanced Aspects of Software Engineering (ICAASE18), Constantine, Algeria, 1,2-December-2018, pub- lished at http://ceur-ws.org 1 Introduction Credit scoring is the process of analyzing credit files, to decide the creditworthiness of an individual. Dis- tinguishing a good applicant for a loan from a bad one is important to cut financial institution’s losses [AEW13]. The use of machine learning tools allows auditors to analyze large amounts of information for evaluating the credit risk in a reasonable time [Yu17]. These algorithms tend to decrease the classification error and assume that all misclassification’s have the same cost. However, the cost for labeling a positive example as negative is different from the cost for label- ing a negative example as positive. Indeed, approving a bad loan is much more costly than rejecting a po- tentially good loan [KBC16]. Indeed, if a loan can not full fill its loan obligations this may result in negative impacts on bank profits and big financial losses. How- ever, if a good loan is rejected, it causes lower profits losses. These algorithms tend to decrease the classification error and assume that all misclassification’s have the same cost. However, the cost for labeling a positive example as negative is different from the cost for label- ing a negative example as positive. Indeed, approving a bad loan is much more costly than rejecting a po- tentially good loan [KBC16]. Indeed, if a loan can not full fill its loan obligations this may result in negative impacts on bank profits and big financial losses. How- ever, if a good loan is rejected, it causes lower profits losses. On the other hand, credit datasets are highly im- balanced which worsens the situation. Traditional ma- chine learning algorithms tend to maximize accuracy Page 56 Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction