Towards a qualitative analysis of diﬀ algorithms Gioele Barabucci, Paolo Ciancarini, Angelo Di Iorio, Fabio Vitali 1 Department of Computer Science and Engineering, University of Bologna Abstract. This paper presents an ongoing research on the qualitative evaluation of diﬀ algorithms and the deltas they produce. Our analysis focuses on qualities that are seldom studied: instead of evaluating the speed or the memory requirements of an algorithm, we focus on how much natural, compact and ﬁt for use in a certain context the produced deltas are. This analysis started as a way to measure the naturalness of the deltas produced by JNDiﬀ, a diﬀ algorithm for XML-based literary documents. The deltas were considered natural if they expressed the changes similarly to how a human expert would do, an analysis that could only be carried out manually. Our research eﬀorts have expanded into the deﬁnition of a set of metrics that are, at the same time, more abstract (thus they capture a wider range of information about the delta) and completely objective (so they can be computed by automatic tools without human supervision). 1 Challenges in evaluating diﬀ algorithms and deltas The diﬀ algorithms have been widely studied in literature and applied to very dif- ferent domains (source code revision, software engineering, collaborative editing, law making, etc.) and data structures (plain text, trees, graphs, ontologies, etc.). Their output is usually expressed as edit script s, also called deltas or patch es. A delta is a set of operations that can be applied to the older document in order to obtain the newer one. Deltas are hardly ever unique, since multiple diﬀer- ent sequences of operations can be devised, all capable of generating the newer document from the older one. Each algorithm uses its own strategies and data-structures to calculate the “best” delta. Some of them are very fast, others use a limited amount of memory, others are specialized for use in a speciﬁc domain and data format. Surprisingly enough, the evaluation of the quality of the deltas has received little attention. The historical reason is that most algorithms have been proposed by the database community focusing more on eﬃciency rather than quality. Another reason is that the produced deltas are not easily comparable: not only each algorithm choose diﬀerent sequences of changes, but they even use their own internal model and recognize their own set of changes. For example, some algo- rithms detect moves while others do not, or the same name is used for diﬀerent operations. Given this degree of heterogeneity, it is hard to evaluate the quality of these algorithms in an automatic and objective way. Nonetheless, we believe that such an evaluation is essential for the ﬁnal users and can eﬀectively support them in selecting the best algorithm for their needs.