Detailed Analysis of Data Mining Tools Rohit Ranjan Pre-Final year student Dept. of Computer Science& Engg. Dayananda Sagar College of Engineering, Bangalore-78 Swati Agarwal Pre-Final year student Dept. of Computer Science& Engg. Dayananda Sagar College Of Engineering, Bangalore-78 Dr. S. Venkatesan Professor Dept. of Computer Science& Engg. Dayananda Sagar College Of Engineering, Bangalore-78 Abstract: - Today the magnificient growth of technology and adoption of the several application renaissance in the information technology sector and the related fields.Due to this striking advancement ,collecting and warehousing the data in necessity.This overall leads to the concept of data mining,which can be viewed as one of the emerging and promising technology development. Data mining is explanation and analysis of large quantities of data in order to extract implicit, previously unknown and potentially meaningful patterns by using some tools and techniques. This paper presents the comprehensive and theoretical analysis of five open source data mining tools Rapidminer, R, Knime, Orange, Weka. The study provides the pros and cons Ziped with the technical specifications features and specialization of each tool.By this complete and hypothetical study, the best slelection of the tool can be made easy. Keywords: Data Mining, Open Source, Dataset, R, Rapidminer, Knime , Orange, Weka. INTRODUCTION: In this information age, with the advent of technology advances and means for mass digital storage, users typically collect and store all varieties of data, counting on the power of technology to help sort through this amalgam of information. These massive collection of data were initially stored on disparate structures, leading to the creation of the structured databases. The efficient database management system (DBMS) have been very essential and crucial assets for management of large corpus of the data. The proliferation of DBMS has also contributed to massive gathering of varieties of information. Confronted with huge collection of data, the need for the hour is to make letter managerial choices. These emergent needs are: Automatic summarization of the data. Extraction of the essence of information stored. Discovery of the pattern in raw data. Data Mining is most suitable answer for all the above mentioned emergent needs. Data Mining is the computational process of discovering patterns in data stored in large data repositories or data warehouse, involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems. The core step of data mining is to mine and discover the novel information in terms of pattern and rules from large volumes of data. The key idea behind data mining is to design and work efficiently with the large dataset. There is non exclusive list of variety of information collected in digital form in datasets :-Businuss Transaction,Scientific Data, Medical and Personal Data, Survilance Video and Pictures, Satellite Sensing,Games and Virtual Works (using CDA) , Text Reports and memos (e-mail message), World Wide Web Repositories. To acquire the sequence and trends in data, Data mining use multiplex algorithm and mathematical analysis(for efficient discussion making).Data mining is frequently treated as synonym of knowledge discovery in databases (KDD) process.The following figure shows data mining as a step in an iterative KDD process. Fig 1:Knowledge Discovery in Database Process The knowledge discovery in the database process comprises of a few steps leading from raw data collection to some form of new knowledge, The iterative process consists of following steps : Data Cleaning :- also known as data cleansing , it is a phase in which noise data and irrelevant data are removed from the collection. Data integration:- at this stage multiple data sources , often heterogeneous , may be combined in common source. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 http://www.ijert.org IJERTV6IS050459 (This work is licensed under a Creative Commons Attribution 4.0 International License.) Published by : www.ijert.org Vol. 6 Issue 05, May - 2017 785