A Predictive Model for the Risk of Infertility in Men Using Machine Learning 1 Nişantaşı University Faculty of Medicine, Department of Medical Statistics, İstanbul, Turkiye 2 Ondokuz Mayıs University Faculty of Medicine, Department of Biostatistics, Samsun, Turkiye 3 Hacettepe University Faculty of Medicine, Department of Biostatistics, Ankara, Turkiye Introduction The World Health Organization defined infertility as 12 months of frequent, unprotected intercourse without pregnancy (1). Infertility is a medical and social problem effects about 15% of couples and 40% of these couples are infertile because of male factor (2). Infertility is a worldwide problem and is estimated that only in Turkey 10-15% couples are infertile (3). Male infertility is highly heterogeneous disorder and genetic causes play an important role in male infertility. Karyotypic abnormalities, cystic fibrosis transmembrane conductance regulator gene mutations and microdeletions on Y chromosome are well known genetic causes in azospermic or severely oligozoospermic men (4,5). There are diverse external factors for infertility including age, smoking, obesity etc (3). Prediction contains variables in the data set to make analysis and find patterns which describes the data structure that can be interpreted by humans (6). Machine learning is a fast-growing field which explores how computers can automatically learn to recognize complex data structures and make a conclusion based on a set of observed data (7). Nowadays, machine learning applications are a part of our daily life in different areas, for example web searches, spam/email filtering, face recognition programs, and speech recognition programs (8). Machine learning has been used for the classification of different medical data and these results show that the performance of this study was given promising results for different data sets. However, gathering and inventorying of more complex data types, the discovery of new diseases, and the development of new diagnostic methods have raised the need for machine learning methods in the medical area which provides new ways for interpreting complex data sets that researcher are faced (9,10). Machine learning has been separated into different subfields which deal with different types of learning tasks. Supervised learning is the most common one used in practice and can be grouped into classification and regression. There are many algorithms for classification task with an increasing number and different features day by day, some classification algorithms commonly used are Decision trees (DT), K Nearest Neighbor (KNN), Naive Bayes (NB), Support Vector Machines (SVM), Random Forest (RF) (11,12). ORIGINAL RESEARCH Andrology Senem Koç 1 , Leman Tomak 2 , Erdem Karabulut 3 Doi: 10.4274/jus.galenos.2022.2021.0134 J Urol Surg Correspondence: Senem Koç, Niantaı University Faculty of Medicine, Department of Medical Statistics, stanbul, Turkiye Phone: +90 532 300 34 88 E-mail: koc.senem@nisantasi.edu.tr ORCID-ID: orcid.org/0000-0002-8357-3431 Received: 09.12.2021 Accepted: 17.04.2022 Abstract Infertility is a worldwide problem and causes considerable social, emotional and psychological stress between couples and among families. This study is aimed at determining the machine learning classifier capable of developing the most effective predictive model to determine the risk of infertility in men by genetic and external factors. The data set was collected at Ondokuz Mayıs University in the department of Urolgy. The model was formulated using supervised learning methods and by algorithms like Decision Tree, K Nearest Neighbor, Naive Bayes, Support Vector Machines, Random Forest and Superlearner. Performances of the classifiers were assessed with the area under curve. Results of the performance evaluation showed that Support Vector Machines and Superlearner algorithm had an ara under curve of 96% and 97% respectively and this performance outperforming the remained classifier. According to the results for importance of variables sperm concentration, FSH and LH and some genetic factors are the important risk factors for infertility. These findings whenever applied to any patient’s record of infertility risk factors can be used to predict the risk of infertility in men. The predictive model developed can be integrated into existing health information systems which can be used by urologist to predict patients’ risk of infertility in real time. Keywords: Classification, superlearner, prediction model, infertility, genetic factors Cite this article as: Koç S, Tomak L, Karabulut E. A Predictive Model for the Risk of Infertility in Men Using Machine Learning.