A Deep Learning Approach for Categorizing Risk Impact in Software Domain
Baala Mithra SM
Global Technology Office
Cognizant Technology Solutions
Chennai, India
baalamithra.sm@cognizant.com
Kuhelee Roy
Global Technology Office Cognizant
Technology Solutions
Chennai, India
Kuhelee.Roy@cognizant.com
Sanglap Sarkar
Global Technology Office
Cognizant Technology Solutions
Chennai, India
sanglap.sarkar@cognizant.com
Venkateshwar Rao Madasu
Global Technology Office
Cognizant Technology Solutions
Chennai, India
Venkateshwar.madasu@cognizant.com
Subrahmanya VRK Rao
Global Technology Office
Cognizant Technology Solutions
Chennai, India
Subrahmanyavrk.rao@cognizant.com
Raj Bala
Global Technology Office
Cognizant Technology Solutions
Chennai, India
Raj.Bala@cognizant.com
Abstract - This paper addresses the problem of identifying the
impact areas of risk from a given text description about the
risk. The challenge of this piece of work lies in the fact that the
description is in natural language. Literature provides a wide
range of proposed framework where statistical machine
learning techniques have been used to predict the risk from
quantitative features. This work views the problem in a natural
language processing perspective. In order to envisage a more
accurate classification of the risk impact category we have used
a deep learning paradigm.
Keywords - n-grams, convolution, deep learning,
backpropagation, bag of words
I. INTRODUCTION
In software domain, the task of risk prediction requires
analysis of historical data pertaining to similar projects for
assessing two vital things viz, estimating the probability that
the objectives of the project will be reached and that the
objectives have been actually reached when certain risk have
occurred. According to [1], a vital component of a risk
management process is identifying and analyzing the risk
data. The risk data, pertaining to a software project
development usually contains attributes like category,
exposure, stage, impact area etc. [2]. Traditional machine
learning techniques have been efficiently used in
combination with evaluation metrics like Neural Network
and Support Vector Machine in order to compute the prior
and posterior probabilities for the failure and success of the
project. The influx of modern technology necessitates rapid
progress in software development to support the same.
Identifying risks and the related impact areas requires
analyzing the historical data along with the risk stage,
exposure and status. Finding the relationship between the
attributes characterizing a risk is a critical part in analyzing
historical data related to risk in software domain. Studies on
impact of risk factors in large-scale IT projects have been
provided in [3] [4]. A list of current software risk items has
been provided in [5]. The main idea behind analyzing
historical data related to risk and predicting the impact of
risk in future is to aid in anticipating and avoiding problems
prior to their occurrence. [6]
Unlike other works in literature, this work sees the
problem of predicting impact of risk, from a natural language
processing perspective. The challenge lies in the fact that the
traditional POS tagging and chunking techniques for
processing sentences in natural language will not suffice for
the current problem. In order to attain greater accuracy deep
learning methodology has been used.
II. RELATED WORK
Few existing framework for risk management have been
provided in [7-9]. According to [10], budget, schedule,
technical qualities are the important factors used to evaluate
if the project objectives are met. On the other hand,
according to [11], the factor contributing to the success of
risk analysis depends on the way a risk is described. In [1],
metrics like Domain, KSLOC, and Complexity has been
used in order to obtain the impact areas of risk. The issue of
delay risk has been dealt with in [12]. Machine Learning
techniques have been used for learning the relationship
existing among various attribute characterizing a risk [1]
[2]. In [13], the chances of unforeseen circumstances related
to failure or damage in terms of monetary aspects has been
dealt with.
Prioritizing the risk and classifying the impact into high,
medium and low was also considered as a part of risk
history analysis. Classifying risks from low to high
including questionnaire was highlighted in [14].
Though fair accuracy rates have been achieved in using
machine learning approaches for analyzing history of risk
and their impact, a better approach encapsulating the
semantic features as well as other metrics is still an area of
concern. Unlike shallow learning approaches where the
features are learned explicitly and classified, deep learning
2015 Seventh International Conference on Computational Intelligence, Modelling and Simulation
2166-8523/15 $31.00 © 2015 IEEE
DOI 10.1109/CIMSim.2015.31
48