International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 2 Issue: 10 3077 – 3080 _______________________________________________________________________________________________ 3077 IJRITCC | October 2014, Available @ http://www.ijritcc.org _______________________________________________________________________________________ Effective And Efficient Approach for Detecting Outliers M.Sowmya Tanuja Computer Science Engineering UCEK, Kakinada Andhra Pradesh, India e-mail: tanuja1225@gmail.com A.Krishna Mohan,Associate Professor Computer Science Engineering UCEK, Kakinada Andhra Pradesh, India e-mail: Krishna.ankala@gmail.com Abstract — Now a days in machine learning research anomaly detection is the main topic. Anomaly detection is the process of identifying unusual behavior. It is widely used in data mining, for example, medical informatics, computer vision, computer security, sensor networks. Statistical approach aims to find the outliers which deviate from such distributions. Most distribution models are assumed univariate, and thus the lack of robustness for multidimensional data. We proposed an online and conditional anomaly detection method based on oversample PCA osPCA with LOO strategy will amplify the effect of outliers. We can successfully use the variation of the dominant principal direction to identify the presence of rare but abnormal data, for conditional anomaly detection expectation-maximization algorithms for learning the model is used. Our approach is reducing computational costs and memory requirements. Keywords-PCA; LOO strategy; online updating ; power method; Anomaly detection __________________________________________________*****_________________________________________________ I. INTRODUCTION Anomaly detection aims to identify the outliers which deviates from the existing data. Mainly we observe some small instant of data which is different from other observation or other data because of that we may cause serious problems in the real world. Practically these anomaly detection are widely used in homeland security cyber security intrusion detection, credit card detection etc. We can’t identify some kind of data which cause severe problems so for that unseen irregular data online anomaly detection is introduced in order to detect the anomalies or outliers in data. Here in this paper we are also using conditional anomaly detection. There are several data attributes which human can’t directly identify as anomaly. Accuracy may also suffer if data attributes consider equally. Here for that reason we are introduction conditional anomaly detection. By using this all the attributes will treated equally while detecting the anomalies without high accuracy. In some conditions these will not be correctly identify the outliers for large data sets for example data mean and least square calculation for linear regression are mainly fragile to outliers. So because of that reason osPCA(Online over sampling Principle component anomaly detection) is used.. In this we are using principle direction to detect the anomalies. We calculate the principle directions for adding and removing the data, by comparing those principle directions we can easily identify the anomalies. LOO strategy is used for calculating the principle direction in this we can take the principle with and without the target instance through which we can detect the anomalies with variation of principle direction. We can also consider the dPCA based detecting the anomalies but it not significantly work for large data sets. For conditionally anomaly detection we are using EM-based model this model will define that the anomaly is a indicator attributes are non identical to the environmental attributes. II. BACKGROUND AND RELATED WORK Mainly it deals with the outliers which are present in the real world entity. Many of them introduced many techniques and methods with they are confined to only small amount of data so that large data can’t handle by these methods. For that reason we introduces a new technique in order to support large amount of data i.e real world data like credit card faults network intrusion etc. In this first we clean the contaminated data to normal after cleaning pattern extraction is used in order to extract the pattern after pattern is extracted we then detect the data whether is contain errors or not by using principle directions. Figure 1: Framework of our approach III. EXISTING SYSTEM There are three major approaches for detecting the anomalies, distribution (statistical), distance based method and density based methods. Distribution approach is predetermined and follow some kind of standard and it is more suitable to that kind of distribution. For distance based approach we calculate the distance between each data instance and using that distance we can identify the anomalies. In Density based approach local outlier factor is used. LOF diagnose the outlierness