World Journal of Research and Review (WJRR) ISSN:2455-3956, Volume-3, Issue-1, July 2016 Pages 38-42 38 www.wjrr.org Abstract— Classification is a central problem in the fields of data mining and machine learning. Using a training set of labelled instances, the task is to build a model (classifier) that can be used to predict the class of new unlabelled instances. Data preparation is crucial to the data mining process, and its focus is to improve the fitness of the training data for the learning algorithms to produce more effective classifiers. Searching for the frequent pattern within a specific sequence has become a much needed task in the various sector. Most recent works are based on Apriori algorithm, GSP, MacroVspan etc. techniques. However, frequent pattern mining can be made more efficient. Two widely applied data preparation methods are feature selection and instance selection, which fall under the umbrella of data reduction. Feature selection is selecting a subset of optimal features. Feature selection is being used in high dimensional data reduction and it is being used in several applications like medical, image processing, text mining, etc. Several methods were introduced for unsupervised feature selection. Among those methods some are based on filter approach and some are based on wrapper approach. In the existing work, unsupervised feature selection methods using Genetic Algorithm, Bat Algorithm and Ant Colony Optimization have been introduced. These methods yield better performance for unsupervised feature selection. We will use a novel method to select subset of features from unlabeled data using binary bat algorithm with sum of squared error as the fitness function. Index Terms— frequent pattern; Pattern Indexing; HashMap; ASCII Byte-encoding, unsupervised feature selection, Binary bat algorithm, K –means, Ant Colony Optimization (ACO), Data Mining, Classiﬁcation, Data Reduction, Instance Selection. I. INTRODUCTION In most of the applications we often have a very large number of features that can be used. Feature selection is a process of selecting subset of features available from large set of data [1]. The best subset of features contains least number of dimensions that contribute to the accuracy of the model by removing irrelevant and redundant features. Feature selection is used to avoid curse of dimensionality and to reduce computational burden. Ms. Aishwarya Vishwas Deshpande, Student, Department of Information Technology Savitribai Phule Pune University/Pune, India, Ms. Sharvari Avinash Deshpande, Student, Department of Information Technology Savitribai Phule Pune University/Pune, India, Ms. Monika Kalyan Doke, Student, Department of Information Technology Savitribai Phule Pune University/Pune, India, Ms. Anagha Chaudhari, Assistant Professor Department of Information Technology Savitribai Phule Pune University/Pune, India, Feature selection in supervised learning can be done easily since we know the class label in supervised learning it’s easy for us to decide which feature we want to keep based on the class label. But class label is not present in unsupervised feature selection. Our paper gives a solution to this problem by using binary bat optimization algorithm for feature selection. II. SYSTEM ARCHITECTURE Data mining is the process of extracting insightful knowledge from large quantities of data either in an automated or a semi-automated fashion. Evolutionary data mining, or genetic data mining is aumbrella term for any data mining using evolutionary algorithms. Evolutionary algorithms work by trying to emulate natural evolution. First, a random series of "rules" are set on the training dataset, which try to generalize the data into formulas.The rules are checked, and the ones that fit the data best are kept, the rules that do not fit the data are discarded. The rules that were kept are then mutated, and multiplied to create new rules. This process iterates as necessary in order to produce a rule that matches the dataset as closely as possible.When this rule is obtained, it is then checked against the test dataset. If the rule still matches the data, then the rule is valid and is kept. If it does not match the data, then it is discarded and the process begins by selecting random rules again. Module 1: Database Before database can be mined for data using evolutionary algorithms, it first has to be cleaned,which means incomplete, noisy or inconsistent data should be repaired. It is imperative that this be done before the mining takes place, as it will help the algorithms produce more accurate results. If data comes from more than one database, they can be integrated, or combined, at this point. When dealing with large datasets, it might be beneficial to also reduce the amount of data being handled. One common method of data reduction works by getting a normalized sample of data from Unsupervised Feature Selection Using Evolutionary Algorithms Ms. Aishwarya Deshpande, Ms. Sharvari Deshpande, Ms. Monika Doke, Ms. Anagha Chaudhari