Big data Analytics of cyber attacks: a review M V Suraj CSE Department MANIT Bhopal, India mvsuraj2992@gmail.com Nikhil Kumar Singh CSE Department MANIT Bhopal, India nikhilsinghmanit@gmail.com Deepak Singh Tomar CSE Department MANIT Bhopal, India deepaktomarmanit@gmail.com Abstract—Cyber crime over big data is expand with unprece- dented rate that badly affects the Internet industry and the global data. Progressively sophisticated attack and offensive methods used by cyber attacker and the growing role of data-driven and intelligence-driven adversaries demonstrate that traditional approaches to mitigate cyber threats are becoming ineffective. Big Data in wide areas also freely available for Marketing, Criminal activities and Fraud detection, Epidemic Intelligence etc. It is tedious job to analysis big data on traditional database, NoSQL have been developed as a scalable platform to store and process large amount of data. In this paper, different types of attacks on big data, its storage mechanism and real time analytics approaches are discussed.Also taking social media data as a big data, studies different social network attacks and provided a problem statement based on the study. Index Terms—Big data, nosql, cyber-attack, social network data, analytics, security. I. I NTRODUCTION Today, data is rapidly generating at an unprecedented scale from wide range of sources. Adoption of new strategies is required for managing such huge data volume, as data has changed a lot over the last few years, to cope up with the increasing demand to deal with terabytes, petabytes, and now zetabytes. This enormous generation of data has caused the arrival of a new era of data management, often referred to as Big Data [1]. Big Data environment is very huge, complex, unstructured, contains incomplete and noisy information, and is heteroge- neous, which may changes the traditional statistical and data analysis approaches. However, it seems that big data makes it feasible to collect more data for extracting some helpful information, but the fact is that more data do not mean more helpful information [2]. As the data and demand for real time processing increases, it will create the need of massive storage space in the distributed environment to enhance high availability and scalability.Giant companies involved in the cloud computing such as Google, Amazon, and Facebook cannot handle the huge amount of data using traditional relational database for their business model. The traditional relational schema is of less use for such applications and shifting to NoSQL database seems a much better approach. A NoSQL or Not Only database provides a flexible mech- anism for storage, retrieval of data, and generally do not use SQL for data manipulation. NoSQL database systems are useful for handling huge quantity of data when datas nature does not need relational model [3]. NoSQL systems provide the ability to horizontally scale out throughput over many servers. It is schema-free, open-source and has easy replication support. Generally, its popularity and rapid growth have come at an exorbitant cost, i.e., information and resources loss due to cyber threats and attacks. Cyber-attacks like injection, malware are very common in the Big data environment. Threats are becoming more advanced with the emergence of Advanced Persistant Threats (APTs), social engineering, ransomeware, and fraud com- mitted through digital identity theft [4].Social media data are easily available online thus there is always a room to exploit.Social networks are been targeted to obtain per- sonal information and to achieve financial gain in un- ethical way.Attacks on social networking through identity theft,cyberbulling and phishing very are common. As the time passes the threat actors are multiplying and todays system are vulnerable from the armies of hackers, and state sponsored initiatives. To counter these more sophisticated attacks, orga- nizations are increasingly exploring new approaches to cyber security. Rather than just relying on serially scanning potential attacks vectors, originations are seeking to implement systems that enable continuous monitoring and data collection from their infrastructure. Behavioral analytics and machine learning can be applied in the real time to this data to create intelligent insights that enable not just detection and response to threats, but can actually predict them before systems are breached. This paper focussed on different types of attacks on big data, its storage mechanism and real time analytics approaches are discussed.Also taking social media data as a big data, studies different social network attacks and provided a problem statement based on the study. II. BIG DATA AND NOSQL Datas in Big Data environment are unable to handle and processed by the traditional systems because data volume is too big to be loaded into a single machine. Also most of the traditional data analytics tools developed for a centralized data analysis process cannot be applied directly to big data.Big Data contain more abnormal or delphic data. For instance, a user may have multiple accounts, or an account may be used by many number of users, which may degrades mining result accuracy. Therefore, many new issues for data analytics are coming up, such as privacy, fault tolerance, security issues,