I.J. Intelligent Systems and Applications, 2022, 5, 35-46 Published Online on October 8, 2022 by MECS Press (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2022.05.04 This work is open access and licensed under the Creative Commons CC BY License. Volume 14 (2022), Issue 5 Detailed Study of Wine Dataset and its Optimization Parneeta Dhaliwal Department of Computer Science and Technology, Manav Rachna University, Faridabad Sector 43, Haryana 121001, India E-mail: parneeta07@gmail.com Suyash Sharma Department of Computer Science and Technology, Manav Rachna University, Faridabad Sector 43, Haryana 121001, India E-mail: suyashsharma9211@gmail.com Lakshay Chauhan Department of Computer Science and Technology, Manav Rachna University, Faridabad Sector 43, Haryana 121001, India E-mail: lakshaychauhan100@gmail.com Received: 07 March 2022; Revised: 16 June 2022; Accepted: 12 August 2022; Published: 8 October 2022 Abstract: The consumption of wine these days is becoming more common in social gatherings and to monitor the health of individuals it's very important to maintain the quality of the wine. For the assessment of wine quality many methods have been proposed. We have described a technique to pre-process the “Vinho Verde” wine dataset. The dataset consists of red and white wine samples. The wine dataset size has been reduced from a total of 13 attributes to 9 attributes without any loss of performance. This has been validated through various classification techniques like Random Forest Classifier, Decision tree Classifiers, K-Nearest Neighbor Classifier and Artificial Neural Network Classifier. These classifiers have been compared based on two performance metrics of accuracy and RMSE values. Among the three classifiers Random Forest tends to outperform the other two classifiers in various measures for predicting the quality of the wine. Index Terms: Machine Learning, Optimisation, Data Analytics, Wine dataset. 1. Introduction In the current era of modernization and digitization, the amount of data generated is unbelievably high. Data has always been the source of information, when processed in numerous ways and the information so generated can be used for making future predictions [1]. With the Internet of Things (IoT) taking over the world, the future of automation is going to increase. As in 2012, every day 2.5 exabytes of data had been created [2]. The amount of data processed on a day-to-day basis has also increased resulting in reduced efficiency of software’s and automation due to lack of proper data management techniques. Thus, data management tools are required for processing large amounts of data and extracting the relevant information efficiently [3]. Artificial intelligence (AI) and machine learning (ML) has shown rapid growth in recent years in the context of data analysis. The new optimized computing techniques typically allow the applications to function efficiently in the real world [4]. Machine learning [5], a branch of Artificial intelligence, deals with training a machine according to a particular dataset, to use its learning for future decision making. The trained system can imitate the way human beings learn and analyse, gradually increasing their predictive accuracy. In the current age of the fourth revolution, the use of machine learning tools has spread across various industries, providing accurate decision making and lower processing time [6,7]. There are various applications around the globe that use data-driven decision-making such as facial recognition, e-commerce, business intelligence etc. Facial recognition means identifying the facial features of a particular person. It helps in identifying a criminal in an offense using CCTV cameras [8]. Another application is E-commerce product recommendation where the system recommends a product to the customer based on his earlier shopping experience [9]. Data analytics [1] can either be predictive, descriptive, diagnosis or prescriptive. Descriptive analysis determines the current system state and provides current and previous data in the form of graphical or statistical output [1,10]. To find out “Why something is happening” [11] or “Why did it happen”, we need Diagnostic analysis for going deep in data to