Automatically Analyzing Groups of Crashes
for Finding Correlations
Marco Castelluccio
Mozilla
London, UK
University Federico II of Naples
Naples, Italy
marco.castelluccio@unina.it
Carlo Sansone
University Federico II of Naples
Naples, Italy
carlo.sansone@unina.it
Luisa Verdoliva
University Federico II of Naples
Naples, Italy
verdoliv@unina.it
Giovanni Poggi
University Federico II of Naples
Naples, Italy
poggi@unina.it
ABSTRACT
We devised an algorithm, inspired by contrast-set mining algo-
rithms such as STUCCO, to automatically fnd statistically signif-
cant properties (correlations) in crash groups. Many earlier works
focused on improving the clustering of crashes but, to the best of
our knowledge, the problem of automatically describing properties
of a cluster of crashes is so far unexplored. This means developers
currently spend a fair amount of time analyzing the groups them-
selves, which in turn means that a) they are not spending their
time actually developing a fx for the crash; and b) they might miss
something in their exploration of the crash data (there is a large
number of attributes in crash reports and it is hard and error-prone
to manually analyze everything). Our algorithm helps developers
and release managers understand crash reports more easily and
in an automated way, helping in pinpointing the root cause of the
crash. The tool implementing the algorithm has been deployed on
Mozilla’s crash reporting service.
CCS CONCEPTS
· Software and its engineering → Software reliability;
KEYWORDS
Crashes; Crash Reports; Crash Analysis.
ACM Reference format:
Marco Castelluccio, Carlo Sansone, Luisa Verdoliva, and Giovanni Poggi.
2017. Automatically Analyzing Groups of Crashes for Finding Correlations.
In Proceedings of 2017 11th Joint Meeting of the European Software Engineering
Conference and the ACM SIGSOFT Symposium on the Foundations of Soft-
ware Engineering, Paderborn, Germany, September 4ś8, 2017 (ESEC/FSE’17),
10 pages.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
ESEC/FSE’17, September 4ś8, 2017, Paderborn, Germany
© 2017 Association for Computing Machinery.
ACM ISBN 978-1-4503-5105-8/17/09. . . $15.00
https://doi.org/10.1145/3106237.3106306
https://doi.org/10.1145/3106237.3106306
1 INTRODUCTION
Fixing crashes is one of the top priorities for software organizations,
as they are one of the main pain points for users and might lead
them to leave. Even a single crash can dramatically worsen how
users perceive a software, especially if it causes the loss of important
data. Acting quickly is thus really important to avoid losing users
and keep a high quality software.
Several software organizations have deployed automated crash
reporting systems, such as Mozilla’s Socorro [1] and Windows Error
Reporting [12], which are used to collect reports from users at the
time of crash. A report received by Socorro comprises typically
more than a hundred attribute-value felds. These reports are then
analyzed by dedicated personnel to fnd out fxes and improve
software quality. It should be realized, however, that these systems
collect a huge number of crash reports daily, about three hundred
thousand reports/day for Socorro, which cannot be processed on
an individual basis. Therefore, the typical workfow consists of two
key phases
(1) crash report clustering;
(2) cluster featuring and analysis.
The goal of clustering is to group together similar reports, as they
are likely originated by multiple instances of the same software
problem. Once the problem is fxed, all these reports can be dis-
carded at once from further analysis. Moreover, clustering allows
one to compute precious statistics on the cluster itself, enabling
the second phase of the workfow. In fact, the typical features of
interest in a cluster concern the frequency of occurrence of attribute-
value pairs, which may provide useful hints for the solution of the
problem. As an example, assume that a perfect clustering process
succeeds in grouping together all crash reports originated by a
given software bug, and assume also that all such reports are char-
acterized by a distinctive feature which is never observed in reports
of other clusters. While not conclusive, this observation would pro-
vide a strong clue for the analyst, and would probably allow a quick
fx of the problem. This idealized process is summarized graphically
in Figure 1.
717