(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 1, 2023 23 | Page www.ijacsa.thesai.org Recognizing Safe Drinking Water and Predicting Water Quality Index using Machine Learning Framework Mohamed Torky 1 , Ali Bakhiet 2 , Mohamed Bakrey 3 , Ahmed Adel Ismail 4 , Ahmed I. B. EL Seddawy 5 Faculty of Artificial Intelligence, Egyptian Russian University (ERU), Badr City, Egypt 1 Higher Institute of Computer Science and Information Systems, Culture & Science City, Giza, Egypt 2, 3 The Higher Institute of Computer and Information Systems, Abo Qir Alexandria 21913, Egypt 4 Arab Academy for Science and Technology and Maritime Transport, Cairo, Egypt 5 Abstract—Water quality monitoring, analysis, and prediction have emerged as important challenges in several uses of water in our life. Recent water quality problems have raised the need for artificial intelligence (AI) models for analyzing water quality, classifying water samples, and predicting water quality index (WQI). In this paper, a machine-learning framework has been proposed for classify drinking water samples (safe/unsafe) and predicting water quality index. The classification tier of the proposed framework consists of nine machine-learning models, which have been applied, tested, validated, and compared for classifying drinking water samples into two classes (safe/unsafe) based on a benchmark dataset. The regression tier consists of six regression models that have been applied to the same dataset for predicting WQI. The experimental results clarified good classification results for the nine models with average accuracy, of 94.7%. However, the obtained results showed the superiority of Random Forest (RF), and Light Gradient Boosting Machine (Light GBM) models in recognizing safe drinking water samples regarding training and testing accuracy compared to the other models in the proposed framework. Moreover, the regression analysis results proved the superiority of LGBM regression, and Extra Trees Regression models in predicting WQI according to training, testing accuracy, 0.99%, and 0.95%, respectively. Moreover, the mean absolute error (MAE) results proved that the same models achieved less error rate, 10% than other applied regression models. These findings have significant implications for the understanding of how novel deep learning models can be developed for predicting water quality, which is suitable for other environmental and industrial purposes. Keywords—Water quality; artificial intelligence; machine learning; deep learning; classification analysis; and regression analysis I. INTRODUCTION In the new green economy, monitoring and evaluating water quality is a central issue for the life of all organisms. Using the classical monitoring ways that depend on chemical monitoring is not enough to evaluate the consequences of some influences and stresses, as predicting the interactive effects of different chemical variables on water microorganisms is very difficult [1]. Rapid industrial development has deteriorated water quality at an alarming rate. In addition, the infrastructure, with the absence of public awareness, and the low quality of hygiene, greatly affects the quality of drinking water [2]. Polluted drinking water is very serious and can adversely affect organisms' health, as well as many environmental, and infrastructural impacts. According to a United Nations (UN) report, roughly, more than 1.5 million people die every year due to water-polluted diseases. In third-world countries, it has been declared that 80% of health issues are due to polluted water. Moreover, 2.5 billion illnesses and five million deaths are reported annually [3], and these are truly terrifying numbers. Due to the lack of robust water monitoring techniques, many countries are unable to enhance their water systems and there are shortcomings to produce effective water recovery systems. These shortcomings may lead to a greater level of uncertainty when developing water resource management policies [4]. Recently, there has been a marked increase in the development of rapidly developing biological monitoring and biological assessment tools for water resources that are reliable enough to manage many degraded water bodies in the USA, Europe, South Africa, and Australia [5]. However, with the huge increase in data generated by monitoring devices and the futility of manual coding, the shortcomings began to appear in those systems due to the lack of an effective mechanism for processing that huge data. However, with the growth of artificial intelligence based on machine learning and deep learning techniques, it can introduce a perfect solution to that problem, such as artificial intelligence is characterized by many predictions, clustering, and classification techniques to produce effective solutions to water quality problems [6]. Research of the past decades has focused largely on analyzing the water quality of rivers based on artificial intelligence (AI) techniques [7]. Using AI models, water quality forecasting, classification, and risk assessment can be achieved easily. Moreover, advanced early warning systems and effective management policies can be designed to add more control and monitoring services to rivers and water bodies [8, 9]. In this paper, a proposed machine learning framework has been introduced for analyzing water quality. It consists of two subsystems; the first subsystem is responsible for classifying water quality based on nine AI models that have been applied, tested, and compared to classify various samples of drinking water as safe to drink or unsafe to drink. The applied nine AI