Bulletin of Electrical Engineering and Informatics Vol. 13, No. 5, October 2024, pp. 3628~3635 ISSN: 2302-9285, DOI: 10.11591/eei.v13i5.7594 3628 Journal homepage: http://beei.org Genetic programming in machine learning based on the evaluation of house affordability classification Suraya Masrom 1 , Norhayati Baharun 2 , Nor Faezah Mohamad Razi 2 , Abdullah Sani Abd Rahman 3 , Nor Hazlina Mohammad 2 , Nor Aslily Sarkam 2 1 Computer Sciences Studies, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA Perak Branch, Perak, Malaysia 2 Mathematical Sciences Studies, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA Perak Branch, Perak, Malaysia 3 Faculty of Sciences and Information Technology, Universiti Teknologi PETRONAS, Perak, Malaysia Article Info ABSTRACT Article history: Received Sep 22, 2023 Revised Feb 24, 2024 Accepted Mar 31, 2024 One of the big challenges in machine learning is difficulty of achieving high accuracy in a short completion time. A more difficulties appeared when the algorithm needs to be used for solving real dataset from the survey-based data collection. Imbalance dataset, insufficient strength of correlations, and outliers are common problems in real dataset. To accelerate the modelling processes, automated machine learning based on meta-heuristics optimization such as genetic programming (GP) has started to emerge and is gaining popularity. However, identifying the best hyper-parameters of the meta-heuristics’ algorithm is the critical issue. This paper demonstrates the evaluation of GP hyper-parameters in modeling machine learning on house affordability dataset. The important hyper-parameters of GP are population size (PS), that has been observed with different setting in this research. The machine learning with GP was used to predict house affordability among employers with transport expenditure and job mobility as some of the attributes. The results from testing that run on hold-out samples show that GP machine learning can reach to 70% accuracy with split ratio 0.2 and GP PS 30. This research contributes to the advancement of automated machine learning techniques, offering potential for faster and more accurate real survey-based datasets. Keywords: Crossover rate Genetic programming House affordability Machine learning Mutation rate Population size This is an open access article under the CC BY-SA license. Corresponding Author: Norhayati Baharun Mathematical Sciences Studies, College of Computing, Informatics and Mathematics Universiti Teknologi MARA Perak Branch 35400 Tapah Road, Perak, Malaysia Email: norha603@uitm.edu.my 1. INTRODUCTION Machine learning has been so prevalent in various domains of real-world problems due to the evolution of industrial 4.0. The ever-increasing realm of machine learning has helped industries, businesses, government agencies, public, and private people in making fast decision for simple and complex problems. In medical [1], [2], education [3][6], agriculture [7], finance and economy [8], [9], building and property [10][12], as well as in engineering [13], the utilization of machine learning is highly substantial. As a result, critical demands are needed to simplify the implementation complexity of machine learning to be used by inexpert or inexperienced data scientists from various research fields. To introduce rapid tools for the novice machine learning users is highly significant.