A Deep Learning Approach for Categorizing Risk Impact in Software Domain Baala Mithra SM Global Technology Office Cognizant Technology Solutions Chennai, India baalamithra.sm@cognizant.com Kuhelee Roy Global Technology Office Cognizant Technology Solutions Chennai, India Kuhelee.Roy@cognizant.com Sanglap Sarkar Global Technology Office Cognizant Technology Solutions Chennai, India sanglap.sarkar@cognizant.com Venkateshwar Rao Madasu Global Technology Office Cognizant Technology Solutions Chennai, India Venkateshwar.madasu@cognizant.com Subrahmanya VRK Rao Global Technology Office Cognizant Technology Solutions Chennai, India Subrahmanyavrk.rao@cognizant.com Raj Bala Global Technology Office Cognizant Technology Solutions Chennai, India Raj.Bala@cognizant.com Abstract - This paper addresses the problem of identifying the impact areas of risk from a given text description about the risk. The challenge of this piece of work lies in the fact that the description is in natural language. Literature provides a wide range of proposed framework where statistical machine learning techniques have been used to predict the risk from quantitative features. This work views the problem in a natural language processing perspective. In order to envisage a more accurate classification of the risk impact category we have used a deep learning paradigm. Keywords - n-grams, convolution, deep learning, backpropagation, bag of words I. INTRODUCTION In software domain, the task of risk prediction requires analysis of historical data pertaining to similar projects for assessing two vital things viz, estimating the probability that the objectives of the project will be reached and that the objectives have been actually reached when certain risk have occurred. According to [1], a vital component of a risk management process is identifying and analyzing the risk data. The risk data, pertaining to a software project development usually contains attributes like category, exposure, stage, impact area etc. [2]. Traditional machine learning techniques have been efficiently used in combination with evaluation metrics like Neural Network and Support Vector Machine in order to compute the prior and posterior probabilities for the failure and success of the project. The influx of modern technology necessitates rapid progress in software development to support the same. Identifying risks and the related impact areas requires analyzing the historical data along with the risk stage, exposure and status. Finding the relationship between the attributes characterizing a risk is a critical part in analyzing historical data related to risk in software domain. Studies on impact of risk factors in large-scale IT projects have been provided in [3] [4]. A list of current software risk items has been provided in [5]. The main idea behind analyzing historical data related to risk and predicting the impact of risk in future is to aid in anticipating and avoiding problems prior to their occurrence. [6] Unlike other works in literature, this work sees the problem of predicting impact of risk, from a natural language processing perspective. The challenge lies in the fact that the traditional POS tagging and chunking techniques for processing sentences in natural language will not suffice for the current problem. In order to attain greater accuracy deep learning methodology has been used. II. RELATED WORK Few existing framework for risk management have been provided in [7-9]. According to [10], budget, schedule, technical qualities are the important factors used to evaluate if the project objectives are met. On the other hand, according to [11], the factor contributing to the success of risk analysis depends on the way a risk is described. In [1], metrics like Domain, KSLOC, and Complexity has been used in order to obtain the impact areas of risk. The issue of delay risk has been dealt with in [12]. Machine Learning techniques have been used for learning the relationship existing among various attribute characterizing a risk [1] [2]. In [13], the chances of unforeseen circumstances related to failure or damage in terms of monetary aspects has been dealt with. Prioritizing the risk and classifying the impact into high, medium and low was also considered as a part of risk history analysis. Classifying risks from low to high including questionnaire was highlighted in [14]. Though fair accuracy rates have been achieved in using machine learning approaches for analyzing history of risk and their impact, a better approach encapsulating the semantic features as well as other metrics is still an area of concern. Unlike shallow learning approaches where the features are learned explicitly and classified, deep learning 2015 Seventh International Conference on Computational Intelligence, Modelling and Simulation 2166-8523/15 $31.00 © 2015 IEEE DOI 10.1109/CIMSim.2015.31 48