1551-3203 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2650206, IEEE Transactions on Industrial Informatics 1 Abstract—The next generation wireless networks are expected to operate in fully automated fashion to meet the burgeoning capacity demand and to serve users with superior quality of experience. Mobile wireless networks can leverage spatio- temporal information about user and network condition to embed the system with end-to-end visibility and intelligence. Big data analytics has emerged as a promising approach to unearth meaningful insights and to build artificially intelligent models with assistance of machine learning tools. Utilizing aforementioned tools and techniques, this paper contributes in two ways. First, we utilize mobile network data (big data) – call detail record (CDR) – to analyze anomalous behavior of mobile wireless network. For anomaly detection purposes, we use unsupervised clustering techniques namely k-means clustering and hierarchical clustering. We compare the detected anomalies with ground truth information to verify their correctness. From the comparative analysis, we observe that when the network experiences abruptly high (unusual) traffic demand at any location and time, it identifies that as anomaly. This helps in identifying regions of interest (RoI) in the network for special action such as resource allocation, fault avoidance solution etc. Second, we train a neural-network based prediction model with anomalous and anomaly-free data to highlight the effect of anomalies in data while training/building intelligent models. In this phase, we transform our anomalous data to anomaly-free and we observe that the error in prediction while training the model with anomaly-free data has largely decreased as compared to the case when the model was trained with anomalous data. Index Terms—Next generation wireless networks, 5G, Anomaly detection; call detail record; machine learning; network analytics; network behavior analysis; wireless cellular network I. INTRODUCTION ASSIVE amount of data and information are produced by and about people, things and their interactions. In wireless communication industries, the major drivers of big data are the increasing number of smart devices, machine-to- machine (M2M) communications, and penetration of social media. With communication network evolution towards 5G, a multitude of technologies like base station (BS) densification and massive multiple input multiple output (MIMO) are Manuscript received October 31, 2016. This work was supported in part by the U.S. National Science Foundation (NSF) under Grants CNS-1650831 and CNS- 1658972. Any opinion, finding, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. Authors are with the Department of Electrical Engineering and Computer Science, Howard University, Washington, D.C. 20059, USA. (E-mail: mdsalik.parwez@bison.howard.edu, db.rawat@ieee.org). expected to elevate the size of data exponentially. The data are generated at very large scale (volume) with expected size of 24.3 Exabyte (EB) per month [1], with fast input/output to/from the network (velocity) and from various sources within and outside the network (variety) and the quality and trust of the data available at an incomparable level of volume, velocity and variety (veracity). These unconventional 4V features (volume, velocity, variety, veracity) of current data generation give birth to big data and thus its management and analysis require schemes for big data analytics [2]. Big data analytics is an umbrella term, that incorporates methods and technologies, hardware and software for collecting, managing and analyzing large scale structured and unstructured data in real-time. Big data analytics works on entire data as opposed to only sample data in conventional data analytics schemes. In the case of small data, analyses were performed by randomly selecting samples (partial data) that were considered as representative of the whole data. Due to analysis of only partial data, the information extracted are inaccurate and incomplete and thus the decisions made are sub-optimal and the performance achieved are poor and sub- optimal. Especially in the case of real network analysis and troubleshooting, precise and quick information are desired for providing exact solution, which can only be possible if whole/big data is analyzed. For current and the envisioned 5G mobile networks, big data offers a number of solutions in a variety of ways; some of them are outlined below [2]. a. Big data analytics offer end-to-end visibility of the wireless network b. Big data analytics enables self-coordination among network functions and entities c. Big data analytics enables assessment of long-term dynamics of the network d. Big data analytics builds faster and proactive network e. Big data analytics enables smart and proactive caching in wireless network f. Big data analytics enables energy efficient network operation g. Big data analytics would enable unified performance evaluation In mobile wireless network, there are a number of network measurements and parameters which are continuously exchanged among, reported and gathered at/from the User Equipment (UE), and nodes in the Radio Access Network (RAN) and Core network of the long term evolution - advanced (LTE-A) network. Example includes call detail Big Data Analytics for User-Activity Analysis and User-Anomaly Detection in Mobile Wireless Network Md Salik Parwez, Student Member, IEEE, Danda B. Rawat, Senior Member, IEEE and Moses Garuba M Copyright (c) 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.