Journal of Data Analysis and Information Processing, 2023, 11, 11-36 https://www.scirp.org/journal/jdaip ISSN Online: 2327-7203 ISSN Print: 2327-7211 DOI: 10.4236/jdaip.2023.111002 Jan. 31, 2023 11 Journal of Data Analysis and Information Processing Modelling Key Population Attrition in the HIV and AIDS Programme in Kenya Using Random Survival Forests with Synthetic Minority Oversampling Technique-Nominal Continuous Evan Kahacho 1* , Charity Wamwea 1 , Bonface Malenje 1 , Gordon Aomo 2 1 Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya 2 Monitoring and Evaluation, Kenya Red Cross Society-Global Fund Unit, Nairobi, Kenya Abstract HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key pop- ulation who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retros- pective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition; while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the es- timated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance in- dices greater than 0.75. How to cite this paper: Kahacho, E., Wam- wea, C., Malenje, B. and Aomo, G. (2023) Modelling Key Population Attrition in the HIV and AIDS Programme in Kenya Using Random Survival Forests with Synthetic Minority Oversampling Technique-Nomi- nal Continuous. Journal of Data Analysis and Information Processing, 11, 11-36. https://doi.org/10.4236/jdaip.2023.111002 Received: October 10, 2022 Accepted: January 28, 2023 Published: January 31, 2023 Copyright © 2023 by author(s) and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access