Vol-3 Issue-2 2017 IJARIIE-ISSN(O)-2395-4396 4027 www.ijariie.com 519 A REVIEW ON INSTANCE AND FEATURE SELECTION IN BIG DATA ENVIRONMENT S. M. Matale 1 , S. S. Banait 2 1 PG Student, Department of Computer Engineering, KKWIEER, Nashik, Maharashtra, India 2 Assistance Prof, Department of Computer Engineering, KKWIEER, Nashik, Maharashtra, India. ABSTRACT Instance and feature selection has become an effective approach due to the enormous data which is continuously being produced in the field of research. It is difficult to process such large datasets by many systems. Though the traditional techniques are useful for large datasets, the numbers when in hundreds, thousands or millions face scaling problems. The proposed work focuses on, scalable instance and feature selection in big data environment. Locality-sensitive hashing instance selection F (LSH-IS-F) is a two pass method used to find similar instances along with Pearson correlation coefficient for feature selection. Hash function family is used which is a general method of reducing the size of a set; this is achieved by reindexing the elements into buckets. This process find similar instance and features in same bucket, hence instance/features can be reduced. The work aims at improving the performance of locality sensitive hashing by storing extra statistics of the instances and features that is assigned to each class in the bucket and also to improve accuracy of instance and feature selection algorithm by prototype generation. Keyword: - Big Data, data reduction, feature selection, hashing, instance selection 1. INTRODUCTION:- Most of the data mining algorithms are applicable to small data sets with few thousands to lacks of records. This degrades the efficiency of data being used for further processing. Presently, millions of records are the most scenarios; hence a new term emerged called as Big Data. Database sizes have grown considerably large in the recent years. Large sizes offer high challenges, which restricts machine learning algorithms to process such enormous volume of data and information. The significance of big data has nothing to do with amount of data you have, rather it deals with what to do with that data. Analysis of data from any resource can be done to find the answers for the facts that enable 1) minimum analysis of cost and reduction in time, 2) product growth, 3) efficient offerings, and 4) to make elegant decision. Merging of big data with high-capacity analytics, accomplish task related to business such as: Find out defects, issues and the main reason of failure Efficient offerings at the point of sale based on the customers business practice To re-calculating total risk analysis within minutes Before the behaviour of an organization is affected detect The quantity of data that’s being produced and stored on a worldwide level is nearly unimaginable that keeps rising. It means that there is still even more likely to collect input insights from the business data and information, thus far, some amount of data is in fact used and analyzed. How does that suggest for analyst? What does this indicate for businesses? For businesses the unprocessed data and information that flows into organizations daily how they make good and efficient use of it? In today’s competitive complex business world various aspects of business are intermingle d; to back up their decisions they need to rely on data. Large volume of data are collected and stored in databases, the requirement for efficient and effective analysis and utilization of the information contained in the data has been growing. Data sets that