Journal of Computer System and Informatics (JoSYC) ISSN 2714-8912 (media online), ISSN 2714-7150 (media cetak) Volume 3, No. 4, August 2022, Page 410419 https://ejurnal.seminar-id.com/index.php/josyc DOI 10.47065/josyc.v3i4.2099 Copyright © 2022 Sindi Fatika Sari, Page 410 This Journal is licensed under a Creative Commons Attribution 4.0 International License Employee Attrition Prediction Using Feature Selection with Information Gain and Random Forest Classification Sindi Fatika Sari, Kemas Muslim Lhaksmana * Informatics, S1 Informatics, Telkom University, Bandung, Indonesia Email: 1 sindifatikas@student.telkomuniversity.ac.id, 2* kemasmuslim@telkomuniversity.ac.id Submitted: 12/08/2022; Accepted: 23/08/2022; Published: 30/08/2022 AbstrakEmployee attrition adalah hilangnya karyawan dalam suatu perusahaan yang disebabkan oleh beberapa faktor, yaitu karyawan mengundurkan diri, pensiun, atau faktor lainnya. Employee attrition dapat berdampak negatif pada suatu perusahaan jika tidak ditangani dengan baik, antara lain penurunan produktivitas. Perusahaan juga membutuhkan lebih banyak waktu dan tenaga untuk merekrut dan melatih karyawan baru untuk mengisi posisi yang kosong. Prediksi attrition ini bertujuan untuk membantu bagian sumber daya manusia (SDM) pada perusahaan untuk mengetahui faktor-faktor apa saja yang memengaruhi terjadinya attrition karyawan. Penelitian ini mengimplementasikan Random Forest dengan membandingkan metode seleksi fitur Information Gain, Select K Best, dan Recursive Feature Elimination untuk mencari seleksi fitur mana yang menghasilkan performasi terbaik. Penerapan metode-metode tersebut mengungguli penelitian sebelumnya dalam hal akurasi, presisi, recall, dan skor f1. Dalam perancangan penelitian ini, penulis pertama mengumpulkan dataset, membuat program, dan menyusun jurnal. Penulis kedua membantu penulis pertama dalam memprogram dan menyiapkan jurnal. Dari hasil pengujian yang telah dilakukan, Information Gain menghasilkan nilai akurasi tertinggi yaitu sebesar 89.2%, sedangkan Select K Best menghasilkan nilai akurasi sebesar 87.8% dan Recursive Feature Elimination menghasilkan nilai akurasi sebesar 88.8%. Kata Kunci: Klasifikasi; Employee Attrition; Seleksi Fitur; Information Gain; Random Forest AbstractEmployee attrition is the loss of employees in a company caused by several factors, namely employees resigning, retiring, or other factors. Employee attrition of employees can have a negative impact on a company if it is not handled properly, including decreased productivity. The company also requires more time and effort to recruit and train new employees to fill vacant positions. This attrition prediction aims to help the human resources (HR) department in the company to find out what factors influence the occurrence of employee attrition. This research implements Random Forest while comparing Information Gain, Select K Best, and Recursive Feature Elimination feature selection methods to find which feature selection produces the best performance. The implementation of the aforementioned methods outperforms previous research in terms of accuracy, precision, recall, and f1 scores. In preparing this research, the first author collects data sets, makes programs, and compiles journals. The second author assists the first author in programming and preparing the journal. From the results of the tests that have been carried out, Information Gain produces the highest accuracy value of 89.2%, while Select K Best produces an accuracy value of 87.8% and Recursive Feature Elimination produces an accuracy value of 88.8%. Keywords: Classification; Employee Attrition; Feature Selection; Information Gain; Random Forest 1. INTRODUCTION With the rapid development of the economy and industry, the phenomenon of employee attrition has gradually become popular in recent years [1]. In a company or agency, attrition often occurs or the process of reducing employees is caused by various factors. Employee attrition is one part of people analytics to help make more appropriate human resource (HR) decisions [2]. Employees are an important element in a company to fulfill the vision and mission to be achieved by the company. By having superior employees, the company has a competitive advantage over other companies [3]. Therefore, we need a system that can manage human resources effectively and efficiently. The reduction of employees can have a negative impact on the company because it brings new problems if not handled properly. When a company changes employees frequently, it can be said that the attrition level of the company is very high. The level of attrition itself is measured based on the number of employees who stop working within a certain period of time. If the attrition level is high, it can cause problems for the company, including recruitment time to recruit, train, and develop new employees to fill vacant job positions [4], Productivity declines, and new employees have to re-adapt. This makes performance not optimal. Prediction of employee attrition is carried out to determine what factors can affect employee attrition and can provide initial information about employee reductions that may occur soon so that the company can take appropriate action against the situation. In this final project, the employee attrition prediction is made using the IBM HR Analytics dataset via the Kaggle.com site [5]. In this study, the authors will compare the use of the Information Gain, Select K Best, and Recursive Feature Elimination (RFE) selection features to find out what factors can affect the occurrence of attrition and provide initial information to the company regarding the possibility of employee attrition that will occur. Then, compare the performance results of the three feature selections using the Random Forest classification method. The Random Forest classification method is used because this method is very suitable for developing predictive models [6]. The difference from previous studies is in research [7] Random Forest produces a good accuracy value of 0.85, but it produces low precision, recall, and f1 score values. Precision value is 0.60, recall is 0.28, and an f1 score is 0.39. Therefore, this study develops previous research to seek more optimal results. In research [8]