Classifying Software Changes: Safe or Vulnerable? Trenton Miller, Krishna Kalubandi, Karthik Rajendran University of California San Diego Abstract—This paper makes use of a machine learning clas- siﬁer to predict if software changes made as part of a particular commit are considered safe or prone to security vulnerabilities. The classiﬁer used is a support vector machine which was able to give a vulnerable change prediction accuracy of 92% and a recall of 81% on the Mozilla Firefox web browser code base. The features supplied to the classiﬁer were derived out of the Software Conﬁguration Management (SCM) system and the actual committed code segments. Further insights such as the most crucial subset of features and the features contributing the most within each feature category are also shown as part of the classiﬁcation results. The results from this classiﬁer can be used to better prevent zero day vulnerabilities, bring down chances of security breaches and to improve overall code quality. Keywords—Change classiﬁcation, Support Vector Machines, software vulnerabilities, zero day vulnerabilities. I. I NTRODUCTION Zero day vulnerabilities pose one of the most prominent threats to software security. By the time they get reported and ﬁxed a lot of malicious users can exploit them to their beneﬁt. These vulnerabilities are usually introduced due to subtle loopholes in the program structure, which don’t get caught during the code review and testing phases. Most software companies follow a bi-weekly or monthly release cycle, a signiﬁcant amount of time elapses between the initial code commit and the release. Even more time would have elapsed when the initial bug reports start coming in. This puts the developer in a difﬁcult situation as she has to now reset and retune herself to the context of the commit. This makes the bug ﬁxing process hard. A security bug or vulnerability ﬁx puts even more pressure on the developer to come up with a patch. To tackle this problem developers make use of change classiﬁcation tools[1] which ﬂag potentially buggy code soon after the commit. We propose enhancements to this classiﬁer so that instead of ﬂagging general bugginess, it ﬂags changes that can cause security vulnerabilities which could lead to zero day exploits. We show the effectiveness of our classiﬁer by testing it against Mozilla Firefox’s browser code base. We make use of the same Support Vector Machine (SVM) classiﬁer but feed in additional security related features which helps in better classifying security vulnerabilities. We also derive insights regarding the most crucial subset of features that can be used for classiﬁcation and the set of features within feature categories which contribute the most to classiﬁcation accuracy. This would help design the feature set for other code bases. A. Major contributions • Enhanced the existing software change classiﬁer to ﬂag code changes that could contain software security related vulnerabilities instead of general bugginess. Achieved this by supplying the classiﬁer with ad- ditional security related features so as to increase classiﬁcation accuracy of potentially vulnerable code changes. • We evaluate the set of features to better understand which subset of features contributes signiﬁcantly for a change to be ﬂagged as potentially vulnerable and report our ﬁndings. II. RELATED WORK Related work exists in the domain of machine learning to predict or classify software changes as clean or buggy. Kim et al.[1] have come up with a software change classiﬁer which classiﬁes code changes as buggy or clean. Using a SVM classiﬁer and a broad set of features derived from the Software Conﬁguration Management (SCM) system they were able to classify software changes as buggy or clean with an accuracy of 78 percent and a 60 percent recall on average. They report classiﬁcation results across 12 different open source projects and also perform an analysis as to which subset of features contributes the most to classifying a software change as buggy or not. Their work does not provide any speciﬁc insights for security related vulnerabilities. Gegick et al.[2] came up with a classiﬁer to classify bug reports as security speciﬁc or not. They achieved a prediction accuracy of 98 percent when they ran their classiﬁer against a large Cisco software system. They made use of features derived solely from the bug reports submitted to the bug tracking system. They perform text mining on the natural language descriptions of the bug reports. They do not make use of any features derived from the actual code change which caused the bug. Zaman et al.[5] did a study on security vs performance bugs on the Mozilla Firefox browser repository. Their study shows that security bug ﬁxes in general constitute more entropy (average number of lines changed per ﬁle among all ﬁles changed per commit). We hypothesized that security bug inducing commits should also have a higher entropy value and this helped in increasing the efﬁciency of our model. Shin et al.[7] did a similar study on whether code complex- ity metrics, code churn and developer network effect help in predicting vulnerabilities. However, their idea was to predict most vulnerable ﬁles per release candidate. Hence, some of their features like number of authors collaborating on a ﬁle and author reputation did not have a considerable improvement on our commit based model. Both Zaman et al.[5] and Shin et al.[7] utilized MSFA (Mozilla Security Foundation Authority) to search for vulner- ability reports and tracked them back to bugzilla and source