Journal of Computer System and Informatics (JoSYC)
ISSN 2714-8912 (media online), ISSN 2714-7150 (media cetak)
Volume 3, No. 4, August 2022, Page 410−419
https://ejurnal.seminar-id.com/index.php/josyc
DOI 10.47065/josyc.v3i4.2099
Copyright © 2022 Sindi Fatika Sari, Page 410
This Journal is licensed under a Creative Commons Attribution 4.0 International License
Employee Attrition Prediction Using Feature Selection with
Information Gain and Random Forest Classification
Sindi Fatika Sari, Kemas Muslim Lhaksmana
*
Informatics, S1 Informatics, Telkom University, Bandung, Indonesia
Email:
1
sindifatikas@student.telkomuniversity.ac.id,
2*
kemasmuslim@telkomuniversity.ac.id
Submitted: 12/08/2022; Accepted: 23/08/2022; Published: 30/08/2022
Abstrak−Employee attrition adalah hilangnya karyawan dalam suatu perusahaan yang disebabkan oleh beberapa faktor, yaitu
karyawan mengundurkan diri, pensiun, atau faktor lainnya. Employee attrition dapat berdampak negatif pada suatu perusahaan
jika tidak ditangani dengan baik, antara lain penurunan produktivitas. Perusahaan juga membutuhkan lebih banyak waktu dan
tenaga untuk merekrut dan melatih karyawan baru untuk mengisi posisi yang kosong. Prediksi attrition ini bertujuan untuk
membantu bagian sumber daya manusia (SDM) pada perusahaan untuk mengetahui faktor-faktor apa saja yang memengaruhi
terjadinya attrition karyawan. Penelitian ini mengimplementasikan Random Forest dengan membandingkan metode seleksi
fitur Information Gain, Select K Best, dan Recursive Feature Elimination untuk mencari seleksi fitur mana yang menghasilkan
performasi terbaik. Penerapan metode-metode tersebut mengungguli penelitian sebelumnya dalam hal akurasi, presisi, recall,
dan skor f1. Dalam perancangan penelitian ini, penulis pertama mengumpulkan dataset, membuat program, dan menyusun
jurnal. Penulis kedua membantu penulis pertama dalam memprogram dan menyiapkan jurnal. Dari hasil pengujian yang telah
dilakukan, Information Gain menghasilkan nilai akurasi tertinggi yaitu sebesar 89.2%, sedangkan Select K Best menghasilkan
nilai akurasi sebesar 87.8% dan Recursive Feature Elimination menghasilkan nilai akurasi sebesar 88.8%.
Kata Kunci: Klasifikasi; Employee Attrition; Seleksi Fitur; Information Gain; Random Forest
Abstract−Employee attrition is the loss of employees in a company caused by several factors, namely employees resigning,
retiring, or other factors. Employee attrition of employees can have a negative impact on a company if it is not handled properly,
including decreased productivity. The company also requires more time and effort to recruit and train new employees to fill
vacant positions. This attrition prediction aims to help the human resources (HR) department in the company to find out what
factors influence the occurrence of employee attrition. This research implements Random Forest while comparing Information
Gain, Select K Best, and Recursive Feature Elimination feature selection methods to find which feature selection produces the
best performance. The implementation of the aforementioned methods outperforms previous research in terms of accuracy,
precision, recall, and f1 scores. In preparing this research, the first author collects data sets, makes programs, and compiles
journals. The second author assists the first author in programming and preparing the journal. From the results of the tests that
have been carried out, Information Gain produces the highest accuracy value of 89.2%, while Select K Best produces an
accuracy value of 87.8% and Recursive Feature Elimination produces an accuracy value of 88.8%.
Keywords: Classification; Employee Attrition; Feature Selection; Information Gain; Random Forest
1. INTRODUCTION
With the rapid development of the economy and industry, the phenomenon of employee attrition has gradually
become popular in recent years [1]. In a company or agency, attrition often occurs or the process of reducing
employees is caused by various factors. Employee attrition is one part of people analytics to help make more
appropriate human resource (HR) decisions [2]. Employees are an important element in a company to fulfill the
vision and mission to be achieved by the company. By having superior employees, the company has a competitive
advantage over other companies [3]. Therefore, we need a system that can manage human resources effectively
and efficiently.
The reduction of employees can have a negative impact on the company because it brings new problems if
not handled properly. When a company changes employees frequently, it can be said that the attrition level of the
company is very high. The level of attrition itself is measured based on the number of employees who stop working
within a certain period of time. If the attrition level is high, it can cause problems for the company, including
recruitment time to recruit, train, and develop new employees to fill vacant job positions [4], Productivity declines,
and new employees have to re-adapt. This makes performance not optimal.
Prediction of employee attrition is carried out to determine what factors can affect employee attrition and
can provide initial information about employee reductions that may occur soon so that the company can take
appropriate action against the situation. In this final project, the employee attrition prediction is made using the
IBM HR Analytics dataset via the Kaggle.com site [5].
In this study, the authors will compare the use of the Information Gain, Select K Best, and Recursive Feature
Elimination (RFE) selection features to find out what factors can affect the occurrence of attrition and provide
initial information to the company regarding the possibility of employee attrition that will occur. Then, compare
the performance results of the three feature selections using the Random Forest classification method. The Random
Forest classification method is used because this method is very suitable for developing predictive models [6].
The difference from previous studies is in research [7] Random Forest produces a good accuracy value of
0.85, but it produces low precision, recall, and f1 score values. Precision value is 0.60, recall is 0.28, and an f1
score is 0.39. Therefore, this study develops previous research to seek more optimal results. In research [8]