Copyright: © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed
under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-
commercial use, distribution, and reproduction in any medium, provided the original work is properly cited
International Journal of Scientific Research in Computer Science, Engineering and Information Technology
ISSN : 2456-3307 (www.ijsrcseit.com)
doi : https://doi.org/10.32628/IJSRCSEIT
173
Analysis and Prediction of Water Quality Data using Machine
Learning Approaches and Exploratory Data Analysis
Ravindra Changala
1
, A Tharun
2
, A Sai Akshith
3
, K Karthik
4
1
Assistant Professor, IT Department, Guru Nanak Institutions Technical Campus, Hyderabad, India
2,3,4
IT Department, Guru Nanak Institutions Technical Campus, Hyderabad, India
Article Info
Publication Issue :
Volume 8, Issue 6
November-December-2022
Page Number : 188-193
Article History
Accepted: 07 Nov 2022
Published: 19 Nov 2022
ABSTRACT
Drinking Water Supply (DWS) is one of the most critical and sensitive systems
to maintain city operations globally. In Europe, the contradiction between the
fast growth of population and obsolete water supply infrastructure is even more
prominent. The high standard water quality requirement not only provides
convenience for people’s daily life but also challenges the risk response time in
the systems. Prevalent water quality regulations are relying on periodic
parameter tests. This brings the danger in bacteria broadcast within the testing
process which can last for 24-48 hours. In order to cope with these problems, we
propose a EDA (Exploratory Data Analysis) model for water quality assessment.
This model consists of two dimensions, including water quality parameters and
score. Furthermore, we applied this model to predict water quality changes in
the DWS system using a Random Forest algorithm using Pycaret. For a case
study, we select an industrial water supply system. The preliminary results show
that this model can provide high predictions & accuracy i.e., 73.76% for water
quality understanding.
Keywords: Water Quality Monitoring, Water Quality Assessment, Water
Quality Analysis, Chain of Custody.
I. INTRODUCTION
Water plays a vital role in everyone’s life and is
observed everywhere and in every form [1]. In
Today’s world, due to climatic changes and pollution
the water quality is been affected in areas and various
experiments are done to test the quality of water [2].
Due to poor water quality, risk occurs in the
industrial areas which damage the whole
environment and causes an economical loss [3].The
root cause for many diseases such as typhoid,
diarrhea, cholera is due to usage of contaminated
water caused by increased industrialization and
urbanization in India. [4]. According to reports form
WHO, it is estimated that about 77 million people
affected by contaminated water in India and 21% of
diseases are caused due to it.[5] Due to insufficient
rainfall and drying up of main reservoirs that supplies
water, India faces water crisis frequently, hence
making water one of the most precious and limited
land resources. Many Organizations including WHO
and BIS has framed standards for water parameters