(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 8, 2021 568 | Page www.ijacsa.thesai.org Threat Analysis using N-median Outlier Detection Method with Deviation Score Pattabhi Mary Jyosthna, Konala Thammi Reddy Department of Computer Science and Engineering GITAM (Deemed to be University) Visakhapatnam, India Abstract—Any organization can only operate optimally if all employees fulfil their roles and responsibilities. For the majority of tasks and activities, each employee must collaborate with other employees. Every employee must log their activities related with their roles, responsibilities, and access permissions. Some users may deviate from their work or abuse their access rights in order to gain a benefit, such as money, or to harm an organization's reputation. Insider threats are caused by these types of users/employees, and those users are known as insiders. Detecting insiders after they have caused damage is more difficult than preventing them from posing a threat. We proposed a method for determining the amount of deviation a user has from other users in the same role group in terms of log activities. This deviation score can be used by role managers to double-check before sharing sensitive information or granting access rights to the entire role group. We first identified the abnormal users in each individual role, and then used distance measures to calculate their deviation score. In a large data space, we considered the problem of identifying abnormal users as outlier detection. The user log activities were first converted using statistics, and the data was then normalized using Min-Max scalar standardization, using PCA to transform the normalized data to a two-dimensional plane to reduce dimensionality. The results of N-Median Outlier Detection (NMOD) are then compared to those of Neighbour-based and Cluster-based outlier detection algorithms. Keywords—Organizational roles; insider threats; outlier detection; deviation score I. INTRODUCTION In a distributed environment, all resources such as infrastructure and data are to be distributed among the employees of an organization to obtain better performance and economic growth of the business. But security becomes a major concern in this distributed environment to avoid unexpected loss of reputation or money of their business. In general, security breaches might occur either from externals who have no rights to access any sort of the organization’s resources or from the internals who have legitimate rights to access the infrastructure within the organization [1]. The purpose of insiders is may be to gain money or sensitive data to disrupt the operation or functionalities of an organization. Comparatively, internal threats are harder than external threats to detect. As per the Insider Threat Report by Cyber security Insiders in 2019 [2], 68% of organizations are getting experience with the frequent insider threats. Insider threats can happen by the people purposely or accidentally. Accidental breaches may happen due to careless users or naïve users. 30% of organizations are using some analytical tools to determine insider threat details like user activity management and summary reports in order to reduce the loss caused by these insider threats. Organizations still need to respond quickly in response to the attacks and should be able to identify or predict future threat possibilities. Finding insiders in an organization is a very challenging task to the organizations. Various Machine Learning (ML) approaches are evolving for carrying out complex and challenging problems that would help to identify and predict malicious intents [4]. In general, a user will be treated as an insider if he/she shows a different behaviour from their previous behaviour and from their peer’s behaviour. The abnormal behaviour of an insider within his allotted role can be defined as the deviation score of a user. Behaviour of a user is nothing but his/her activities or computer system usage in the organization [3]. Researchers are applying either classification or clustering algorithms based on the data that they have gathered regarding insiders. If the dataset includes details of the user's activities in some insider threat incidents, then the researchers can use classification algorithms to build a model with that data. This model will be used in future to classify whether the new user activities can lead to internal threat or not. If the data is about user roles and their activities within the organization, then ML clustering algorithms can be used to cluster the users. To work on or to analyse the historical data about insider’s activities, The Computer Emergency Response Team (CERT) Division, in partnership with Exact Data, LLC, and under sponsorship from Defense Advanced Research Projects Agency (DARPA) I2O [5], generated a collection of synthetic insider threat test datasets which will be available publicly. The CERT r6.1 dataset simulates an organization with 4000 users’ activities like login/logoff, thumb drive connectivity, file access and their roles over the period of 12 months. The purpose of this paper is to apply existing outlier detection techniques to analyse user activities which are assumed to be generated from different sources and proposed a new N- Median Outlier Detection (NMOD) model to find role wise outliers. Here, a role is nothing but a job role within the organization. This proposed model can able to do the following:  Aggregate all log files generated from different monitoring tools based on the user activities in an organization.