(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 2, 2021 380 | Page www.ijacsa.thesai.org Exploratory Study of Some Machine Learning Techniques to Classify the Patient Treatment Mujiono Sadikin 1 , Ida Nurhaida 2 Faculty of Computer Science Universitas Mercu Buana, Jakarta, Indonesia Ria Puspita Sari 3 Siloam Hospital Jakarta, Indonesia AbstractNumerous studies have been carried out on computation and its applications to medical data with proven benefits for improving the quality of public health. However, not all research results or practical applications can be applied to all conditions but must be in accordance with the various contexts such as community culture, geographical, or citizen behaviors. Unfortunately, the use of digital data in Indonesia is still very limited. The study objective is to assess various data mining techniques to utilize data from laboratory test results collected from a private hospital in Indonesia in predicting the next patient treatment. Furthermore, various machine learning classification techniques were explored for the purpose. Based on the experiments, it was concluded that XGBoost with hyperparameter tuning produced the best accuracy level at 0.7579, compared to other classifiers. A better level of accuracy can be obtained by enriching the type of dataset used, such as the patient's medical record history. KeywordsElectronic health record; XGBoost; patient treatment; patient laboratory test data I. INTRODUCTION The success of medical treatment services is dependent on the quality of health services and the information precision related to the medical aspects [1]. Unfortunately, the access performed to the relevant medical information is increasingly difficult due to the rapid growth in data volume and its heterogeneous format as well. Health care is one of the most complex industries which includes many stakeholders, various tools, and technologies as well [2]. The new techniques are always needed to assist the dealing with this type of data with the computational technique used to address the problem related to medical information. Various techniques have been carried out on medically related datasets for many purposes. Due to the broad scope of the medical and health fields, various research topics are found which ranges from diagnosis [3][5], diseases [3], [4], [6] [11], patient's condition [12][17], prescription and medication [1], [18][21], to genomics [22][24]. The development of the health system, its problems, and challenges tightly relates to multi factors and contexts such as geographic location, local regulation, community-style demographics, wealth level, etc. The contextual factors are the most important part used to develop the health-medical researches endorsed by the Agency for Healthcare Research and Quality USA [25]. These factors are following the World Health Organization, which supports the achievement level of best health services quality, and medical devices operation base on the contextual context [26]. The studies regarding the contextual factors in developing the Primary Health Center (PHC) [39], showed that many factors such as social models, an institutional context that promotes risk-averseness and patient care, infrastructure, community expectation, and doctors' disinterest in primary care roles need to be considered. Unfortunately for our local context i.e. Indonesia, compared to a very large population and a very large area of the country, the studies regarding the medical records or electronical health are very limited. Some of the studies which focus on the local context are published in [11] and [15]. The first article presents the study results of early dengue disease detection with the dataset captured from some public health (PUSKESMAS). The study overcame the problem associated with physical detection methods in detecting the patient's symptoms by comparing some conventional classifiers with the ELM technique. In the second study, the authors performed the toddler's nutritional status identification using the clustering method, which is categorized into 5 clusters: good, moderate, malnutrition, over, and obesity. The other study is the enrichment of ontology in tuberculosis epidemiology domain use the pulmonary TB (Tuberculosis) scientific documents [27]. Considering the importance of the right context in conducting health-medical research, this study was conducted to utilize the value of patient medical data from the results of laboratory tests taken from one of a private hospital in Indonesia. This dataset is the main consideration factor used by doctors to determine the next course of action for patients, whether they need to be hospitalized (in-patient care) or not (out-patient care). Various literary studies show that AI and Datascience-based tools are proven to be able to improve the quality of health services. According to authorsknowledge, in the field of health care in Indonesia, there are no AI-based tools available. This research is an attempt to contribute to this field. In this study, we elaborate on some of the machine learning techniques used to classify these patient treatments based on the laboratory test results data. Compared to the other technique, the XGBoost with Grid Search hyperparameters optimization performance is outperform. In addition, this research utilized EHR data from the local context to determine the characteristics of patients’ treatment, with similar pattern distribution. Therefore, the machine learning technique was proposed by the authors to handle this problem. The article is organized as follows: the first section