International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 3966 GDPS - General Disease Prediction System Shratik J. Mishra 1 , Albar M. Vasi 2 , Vinay S. Menon 3, Prof. K. Jayamalini 4 1,2,3,4 Department of Computer Engineering, Shree L.R. Tiwari college of Engineering, Maharashtra, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The successful application of data mining in highly visible fields like e-business, commerce and trade has led to its application in other industries. The medical environment is still information rich but knowledge weak. There is a wealth of data possible within the medical systems. However, there is a lack of powerful analysis tools to identify hidden relationships and trends in data. Disease is a term that assigns to a large number of heath care conditions related to the body. These medical conditions describe the unexpected health conditions that directly control all the body parts. Medical data mining techniques like association rule mining, classification, clustering is implemented to analyze the different kinds of general body based problems. Classification is an important problem in data mining. A number of popular classifiers construct decision trees to generate class models. The data classification is based on ID3 Decision Tree algorithm which result in accuracy, the data is estimated using entropy based cross validations and partition techniques and the results are compared. Key Words: Disease, prediction, machine-learning, data mining, ID3. 1. INTRODUCTION It is estimated that more than 70% of people in India are prone to general body diseases like viral, flu, cough, cold .etc, in every 2 months. Because many people don't realize that the general body diseases could be symptoms to something more harmful, 25 % of the population succumbs to death because of ignoring the early general body symptoms. This could be a dangerous situation for the population and can be are alarming. Hence identifying or predicting the disease at the earliest is very important to avoid any unwanted casualties. The currently available systems are the systems that are either dedicated to a particular disease or are in research phase for algorithms when it comes to generalized disease. The purpose of this system is to provide prediction for the general and more commonly occurring disease that when unchecked can turn into fatal disease. The system applies data mining techniques and ID3 decision tree algorithms. This system will predict the most possible disease based on the given symptoms and precautionary measures required to avoid the aggression of disease, it will also help the doctors analyse the pattern of presence of diseases in the society. In this project, the disease prediction system will carry out data mining in its preliminary stages, the system will be trained using machine learning and data mining. The paper is divided into five sections. The first section gives a brief introduction of about the system. The second section is about data mining and the study of related existing systems. The third section details out the implementation of the system. The fourth section provides the results obtained using mining algorithms. Finally the conclusion gives the summary and future scope about the system. 2. LITERATURE REVIEW Here we will elaborate the aspects like the literature survey of the project and what all projects are existing and been actually used in the market which the makers of this project took the inspiration from and thus decided to go ahead with the project covering with the problem statement. 2.1 Existing Systems The authors of this project, Narander Kumar and Sabita Khatri [1], have researched and made comparisons of different algorithms such as k-NN, Naïve Bayes, Random Forest, J48, using performance measures like ROC, kappa statistics, RMSE and MAE in WEKA tools, and also compared the classifiers on various accuracy measures. The conclusion reached of this research was that Random Forest has better accuracy for chronic kidney dataset that was used. In this project the authors, Monika Gandhi and Dr. Shailendra Singh [2], have analyzed different data mining algorithms like Naïve Bayes, Neural network and decision tree algorithms for their accuracy on prediction of Heart Disease. The authors Marija Sultana, Afrin Haider and Md.Shorif Uddin [3], have analyzed algorithms such as K-star, J48, SMO, Bayes Net and Multilayer Perceptron Network using WEKA tools for heart disease prediction dataset. The performance of these datamining techniques in acquired by combination of results of measures such as predictive accuracy, ROC curve and AUC value. The result obtained is the SMO and Bayes network show more optimum result than their other mentioned counterparts. In this project, the authors Girija D.K, Dr. M.S. Shashidhara and M.Giri [4], make use of Neural networks to make predictions regarding presence of uterine fibroid disease. The experimental results show an accuracy of 98% using the Multilayer perceptron neural network and data mining. This project focuses on the most common form of cancer present in women i.e. Breast Cancer and its recurrence. The authors Uma Ojha and Dr. Savita Goel, in this project have researched on many data mining algorithms in both