An Intelligent Arabic Model for Recruitment Fraud Detection Using Machine Learning Mohamed A. Sofy 1, *, Mohammed H. Khafagy 2 , and Rasha M. Badry 1 1 Information System Department, Faculty of Computers and Information, Fayoum University, Fayoum 63511, Egypt; Email: rmb01@fayoum.edu.eg 2 Computer Science Department, Faculty of computers and Information, Fayoum University, Egypt; Email: mhk00@fayoum.edu.eg *Correspondence: ma3152@fayoum.edu.eg Abstract—Over the last years, with the tremendous growth of digital transformation and the constant need for companies to hire employees, huge amounts of fraudulent jobs have been posted on the internet. A cleverly planned sort of scam aimed at job searchers for a variety of unprofessional purposes is a false job posting. It can lead to a loss of money and effort. An Arabic intelligent model has been built to avoid fraudulent jobs on the Internet using machine learning, data mining, and classification techniques. The proposed model is applied to the Arabic version of the EMSCAD dataset. It is available on the Internet in the English version and it has been retrieved from the use of a real-life system and consists of several features such as company profile, company logo, interview questions, and more features depending on job offer ads, Firstly, EMSCAD is translated into the Arabic language. Then, a set of different classifiers such as Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB), and K-Nearest Neighbor (KNN) was used to detect the fraudulent jobs. Finally, the results were compared to determine the best classifier used for detecting fraudulent jobs. The proposed model achieved better results when using a Random Forest classifier with 97% accuracy. Keywords—data mining, fraud detection, online recruitment, machine learning, EMSCAD dataset I. INTRODUCTION The trend towards technology requires us to constantly research the development of our services. Companies have relied on working through the Internet (working from home). The recruitment processes have become very easy to do online through many recruitment portals such as “Wazef”, “frasna”, where job seekers upload their resumes and skills. Companies check their files and contact the candidates to schedule an online interview. There have been 174 reported cases of employment scams across Australia during the month of February 2022, alone. Among 174 reported cases, 16.1% incurred financial loss and the total amount adds up to 142,762 Australian Dollars [1]. This huge number makes ambitions tend towards data theft, taking advantage of the youth’s need for work. Using computers as an instrument to further illegal ends, such as committing fraud, intellectual property, stealing identities, or violating privacy. Cybercrime, especially through the Internet, has grown in importance as the computer has become central to commerce and entertainment. Cybercrimes constitute many problems for the individual and society, as the cost of damage around the world is estimated at 6 trillion dollars by the year 2021 [2] Therefore, we need to prevent the theft of confidential data to avoid theft. To solve this issue, prevention, and detection method can be used. Egypt’s Vision 2030 is heading towards digital transformation and automation in various fields of work, raising the degree of Flexibility and competitiveness of the economy, increasing employment rates, and decent job opportunities, and improving the business environment, and society. Therefore, it was necessary to follow up and detect fraudulent recruitment operations, the purpose of which is to steal individuals’ data and threaten them, as criminals publish fake advertisements for employment and exploit job seekers. This causes the loss of money and data for individuals, as well as the loss of the reputation of the organization and the threat to society. Machine learning is one of the most important solutions and has many examples in our daily lives, such as Netflix and Siri, Companies also use machine learning to create perceptions about future visions, improve customer service and reduce costs, but most of this data is available in languages other than Arabic, which may be English or French. It was necessary to reduce these crimes in our Arab region, and therefore we need a huge volume of data in Arabic and the lack of data. It was important to translate the available dataset into the Arabic language, this is considered a new challenge in our research paper, as it does not exist before. The difficulty of working and understanding the Arabic language is because it contains many synonyms, in addition to its many types which are traditional Arabic, as in the language of the Qur ’an, the modern classical language, used in official conversations and on television, and the vernacular. Therefore, the ambiguity in the language makes the Arabic language difficult to learn for machines to make decisions. In addition, it has become very beneficial to use data mining and data analysis to predict and detect fraudulent Manuscript received July 17, 2022; revised September 13, 2022; accepted September 28, 2022; published February 17, 2023. 102 Journal of Advances in Information Technology, Vol. 14, No. 1, February 2023 doi: 10.12720/jait.14.1.102-111