International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 6 Issue: 3 187 - 191 ______________________________________________________________________________________ 187 IJRITCC | March 2018, Available @ http://www.ijritcc.org _______________________________________________________________________________________ Machine Learning based Traffic Classification using Statistical Analysis Abirami Sivaprasad Assistant Professor IT-SAKEC, Mumbai,India. abi.lecturer@gmail.com Neha Ghawalkar IT-SAKEC Mumbai,India. neha.ghawalkar2018@gmail.com Srushti Hodge IT-SAKEC Mumbai,India. srushtihodge@gmail.com Maitri Sanghavi IT-SAKEC Mumbai,India. maitris04@gmail.com Vidhya Shinde IT-SAKEC Mumbai,India. vidhyass.shinde@gmail.com Abstract— In this paper, Automated system is built which contains processing of captured packets from the network. Machine learning algorithms are used to build a traffic classifier which will classify the packets as malicious or non-malicious. Previously, many traditional ways were used to classify the network packets using tools, but this approach contains machine learning approach, which is an open field to explore and has provided outstanding results till now. The main aim is to perform traffic monitoring, analyze it and govern the intruders. The CTU-13 is a dataset of botnet traffic which is used to develop traffic classification system based on the features of the captured packets on the network. This type of classification will assist the IT administrators to determine the unknown attacks which are broadening in the IT industry. Keywords—Data Mining,Machine learning,Networksecurity,IDS,Attacks, Malicious, Classification. __________________________________________________*****_________________________________________________ I. INTRODUCTION As we all know in the current modern network the size of the captured network data is growing exponentially, so there is a greater need to apply the classification algorithms to the collected data set which helps in determining the set of malicious and normal traffic. This type of classification is important for the purpose of network monitoring systems and security incidents[6]. Later well assigned port numbers were used for the purpose of identification of network traffic. For example port 80 is used for HTTP communication and port 25 for SMTP communication. But in currentwith fastgrowing internet, applications are using dynamic changed port numbers which is making the port based traffic classification a tedious job. After port based classification of network traffic, payload based inspection came into play. This classification can achieve good accuracy once the payload can be accessed and inspected properly. In spite of good accuracy the payload based classification has its own limitations in terms of slowness and resource consumptions. In research community, [1] some authors proposed automatic mechanisms for derivation of payload features and proved some promising results,[2] but these approaches still have their own limitations. The methodologies discussed in it depend and require large amount of memory and processing time. But if we inspect only initial few bytes of the payload than it requires less amount of memory and processing time[3]. With the change in technology the size of the network data is increasing day by day, now the researchers have been using machine learning techniques based on the features to classify data. Machine learning based algorithms create the classification model by using the large data set and calculated features. [4] Moreover, the statistical properties based features of the network traffic is also becoming important for machine learning based classifications such as packet length statistics for a network traffic flow, for example the minimum, mean, maximum, standard deviation of the packet sizes. With the consideration of the Machine learning(ML) based techniques and based on these calculated features statistics, a good traffic classifier can be developed[5]. While ML classifiers have shown good efficiency and promising accuracy, accuracy is often lower than that of payload-based classifiers (for traffic for which payload signatures exist). A.AIM OF THE PROJECT The aim of the proposed work is to perform the traffic monitoring based on machine learning techniques, analyze the network attack logs to determine the intruders and build the traffic classifier for the determination of malicious and normal traffic from the built data set. Live data packets capturing through DOS attack will benefit the analysis of networkbased classification. B.OBJECTIVE OF THE PROJECT 1.Automated network data capturing and logging mechanism. 2.Feature extraction of captured data. 3.Data pre-processing engine to extract relevant & attack data features. 4.Data Analysis and classification based on “R” tool. 5.Performance measurement and result analysis.