Modeling Skewness in Vulnerability Discovery
HyunChul Joh
a
*
†
and Yashwant K. Malaiya
b
A vulnerability discovery model attempts to model the rate at which the vulnerabilities are discovered in a software product.
Recent studies have shown that the S-shaped Alhazmi–Malaiya Logistic (AML) vulnerability discovery model often fits better
than other models and demonstrates superior prediction capabilities for several major software systems. However, the AML
model is based on the logistic distribution, which assumes a symmetrical discovery process with a peak in the center. Hence,
it can be expected that when the discovery process does not follow a symmetrical pattern, an asymmetrical distribution
based discovery model might perform better. Here, the relationship between performance of S-shaped vulnerability
discovery models and the skewness in target vulnerability datasets is examined. To study the possible dependence on the
skew, alternative S-shaped models based on the Weibull, Beta, Gamma and Normal distributions are introduced and
evaluated. The models are fitted to data from eight major software systems. The applicability of the models is examined
using two separate approaches: goodness of fit test to see how well the models track the data, and prediction capability
using average error and average bias measures. It is observed that an excellent goodness of fit does not necessarily result
in a superior prediction capability. The results show that when the prediction capability is considered, all the right skewed
datasets are represented better with the Gamma distribution-based model. The symmetrical models tend to predict better
for left skewed datasets; the AML model is found to be the best among them. Copyright © 2013 John Wiley & Sons, Ltd.
Keywords: data models; security; empirical studies; vulnerability discovery model (VDM); skewness
1. Introduction
B
efore software developers release a product to the customers, it needs to satisfy not only the functional and technical
requirements, it also should be sufficiently reliable and secure. After the release, developers must ensure that patches are
available as soon as possible for the vulnerabilities that will be discovered. If software development managers can make
accurate projections for the vulnerability discovery process, they can optimally allocate the needed resources that are likely to be
needed for rapid patch development. A quantitative characterization of the vulnerability discovery rates is necessary to assess the
risks associated with the product. A vulnerability is defined as a defect or weakness in the security system which might be exploited
by a malicious user causing loss or harm.
1
A critical vulnerability can provide an attacker the ability to gain full control of the system or
leakage of highly sensitive information.
For non-security-related software defects, the most used reliability metrics are residual fault density and failure intensity.
2
These
measures can be used in data-driven quantitative analysis methods that can be used by the developers to control development in
order to achieve the target reliability levels. The software reliability growth models (SRGMs) that attempt to relate the defect
discovery to the testing time form a core part of the software reliability engineering discipline.
3,4
The vulnerability discovery
models (VDMs) proposed recently are somewhat analogous to the SRGMs, but there are significant differences. Vulnerabilities,
which are security-related defects, tend to have a different profile than ordinary software defects.
5,6
Ordinary defects found after
release are frequently ignored and not fixed until the next release because they do not represent a high degree of risk. On the
other hand, software developers need to patch vulnerabilities right after they are found due to the high risks they represent.
The security issues can greatly impact organizations such as banks, brokerage houses, on-line merchants, government offices as
well as individuals.
A quantitative analysis for software vulnerability discovery process is required for optimizing testing, maintenance and risk
assessment of the software systems because the quantitative methods provide actual data-driven analytical methods. For a
quantitative assessment to become feasible, it is necessary for the software systems to have been around for a sufficiently long time,
so that the related datasets are significant enough to be analyzed.
7,8
a
School of General Studies, Gwangju Institute of Science and Technology, Gwangju, Korea
b
Computer Science Department, Colorado State University, Fort Collins, CO 80523, USA
*Correspondence to: HyunChul Joh, School of General Studies, Gwangju Institute of Science and Technology, B-312 GIST College, 123 Cheomdan-gwagiro, Buk-gu,
Gwangju, 500-712, Korea.
†
E-mail: joh@gist.ac.kr
Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 1445–1459
Research Article
(wileyonlinelibrary.com) DOI: 10.1002/qre.1567
Published online 2 September 2013 in Wiley Online Library
1445