Classifying Common Vulnerabilities
and Exposures Database Using Text
Mining and Graph Theoretical Analysis
Ferda Özdemir Sönmez
Abstract Although common vulnerabilities and exposures data (CVE) is commonly
known and used to keep vulnerability descriptions. It lacks enough classifiers that
increase its usability. This results in focusing on some well-known vulnerabilities
and leaving others during the security tests. Better classification of this dataset would
result in finding solutions to a larger set of vulnerabilities/exposures. In this research,
vulnerability and exposure data (CVE) is examined in detail using both manual and
computerized content analysis techniques. Later, graph theoretical techniques are
used to scrutinize the CVE data. The computerized content analysis made it possible
to find out 94 concepts associated with the CVE records. The author was able to
relate these concepts to 11 logical groups. Using the network of the relationships
of these 94 concepts further in the graph theoretical analysis made it possible to
discover groups of contents, thus, the CVE items which have similarities. Moreover,
lacking some concepts pointed out the problems related to CVE such as delays in
the review CVE process or not being preferred by some user groups.
Keywords Content analysis · Text mining · Graph theoretical analysis ·
Leximancer · Pajek · CVE · Common vulnerabilities and exposures
1 Introduction
Common Vulnerabilities and Exposures (CVE) dictionary [1], which is also called
as dataset or database in some sources, is a huge set of vulnerabilities and exposures
data which is considered as the naming standard for vulnerabilities and exposures
in numerous security-related studies, books, articles and by the vendors of security-
related products including Microsoft, Oracle, Apple, IBM, and many others. Despite
F. Ö. Sönmez (B )
Informatics Institute Middle East Technical University, Ankara, Turkey
e-mail: ferdaozdemir@gmail.com
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer
Nature Switzerland AG 2021
Y. Maleh et al. (eds.), Machine Intelligence and Big Data Analytics for Cybersecurity
Applications, Studies in Computational Intelligence 919,
https://doi.org/10.1007/978-3-030-57024-8_14
313