Classifying Software Changes: Safe or Vulnerable? Trenton Miller, Krishna Kalubandi, Karthik Rajendran University of California San Diego Abstract—This paper makes use of a machine learning clas- sifier to predict if software changes made as part of a particular commit are considered safe or prone to security vulnerabilities. The classifier used is a support vector machine which was able to give a vulnerable change prediction accuracy of 92% and a recall of 81% on the Mozilla Firefox web browser code base. The features supplied to the classifier were derived out of the Software Configuration Management (SCM) system and the actual committed code segments. Further insights such as the most crucial subset of features and the features contributing the most within each feature category are also shown as part of the classification results. The results from this classifier can be used to better prevent zero day vulnerabilities, bring down chances of security breaches and to improve overall code quality. KeywordsChange classification, Support Vector Machines, software vulnerabilities, zero day vulnerabilities. I. I NTRODUCTION Zero day vulnerabilities pose one of the most prominent threats to software security. By the time they get reported and fixed a lot of malicious users can exploit them to their benefit. These vulnerabilities are usually introduced due to subtle loopholes in the program structure, which don’t get caught during the code review and testing phases. Most software companies follow a bi-weekly or monthly release cycle, a significant amount of time elapses between the initial code commit and the release. Even more time would have elapsed when the initial bug reports start coming in. This puts the developer in a difficult situation as she has to now reset and retune herself to the context of the commit. This makes the bug fixing process hard. A security bug or vulnerability fix puts even more pressure on the developer to come up with a patch. To tackle this problem developers make use of change classification tools[1] which flag potentially buggy code soon after the commit. We propose enhancements to this classifier so that instead of flagging general bugginess, it flags changes that can cause security vulnerabilities which could lead to zero day exploits. We show the effectiveness of our classifier by testing it against Mozilla Firefox’s browser code base. We make use of the same Support Vector Machine (SVM) classifier but feed in additional security related features which helps in better classifying security vulnerabilities. We also derive insights regarding the most crucial subset of features that can be used for classification and the set of features within feature categories which contribute the most to classification accuracy. This would help design the feature set for other code bases. A. Major contributions Enhanced the existing software change classifier to flag code changes that could contain software security related vulnerabilities instead of general bugginess. Achieved this by supplying the classifier with ad- ditional security related features so as to increase classification accuracy of potentially vulnerable code changes. We evaluate the set of features to better understand which subset of features contributes significantly for a change to be flagged as potentially vulnerable and report our findings. II. RELATED WORK Related work exists in the domain of machine learning to predict or classify software changes as clean or buggy. Kim et al.[1] have come up with a software change classifier which classifies code changes as buggy or clean. Using a SVM classifier and a broad set of features derived from the Software Configuration Management (SCM) system they were able to classify software changes as buggy or clean with an accuracy of 78 percent and a 60 percent recall on average. They report classification results across 12 different open source projects and also perform an analysis as to which subset of features contributes the most to classifying a software change as buggy or not. Their work does not provide any specific insights for security related vulnerabilities. Gegick et al.[2] came up with a classifier to classify bug reports as security specific or not. They achieved a prediction accuracy of 98 percent when they ran their classifier against a large Cisco software system. They made use of features derived solely from the bug reports submitted to the bug tracking system. They perform text mining on the natural language descriptions of the bug reports. They do not make use of any features derived from the actual code change which caused the bug. Zaman et al.[5] did a study on security vs performance bugs on the Mozilla Firefox browser repository. Their study shows that security bug fixes in general constitute more entropy (average number of lines changed per file among all files changed per commit). We hypothesized that security bug inducing commits should also have a higher entropy value and this helped in increasing the efficiency of our model. Shin et al.[7] did a similar study on whether code complex- ity metrics, code churn and developer network effect help in predicting vulnerabilities. However, their idea was to predict most vulnerable files per release candidate. Hence, some of their features like number of authors collaborating on a file and author reputation did not have a considerable improvement on our commit based model. Both Zaman et al.[5] and Shin et al.[7] utilized MSFA (Mozilla Security Foundation Authority) to search for vulner- ability reports and tracked them back to bugzilla and source