Interpreting Burrows’s Delta: Geometric and Probabilistic Foundations Shlomo Argamon Linguistic Cognition Laboratory Department of Computer Science Illinois Institute of Technology Chicago, IL 60616 argamon@iit.edu September 19, 2006 Abstract While Burrows’s intuitive and elegant “Delta” measure for authorship attribution has proven to be extremely useful for authorship attribution, a theoretical understanding of its operation has remained somewhat obscure. In this paper, I address this issue by introducing a geometric interpretation of Delta, which further allows us to interpret Delta as a probabilistic ranking principle. This interpretation gives us a better understanding of the method’s fundamental assumptions and potential limitations, as well as leading to several well-founded variations and extensions. 1 Introduction In his 2001 Busa Award lecture, John F. Burrows (2003) proposed a new measure for authorship attribution which he termed “Delta”, defined as: the mean of the absolute differences between the z-scores for a set of word-variables in a given text-group and the z-scores for the same set of word-variables in a target text. (Burrows, 2002) The measure assumes some set of comparison texts is given, with respect to which z-scores are computed (based on the mean and standard deviation of word frequencies in the comparison cor- pus). The Delta measure is then computed between the target text and each of a set of candidate texts (generally comprising the comparison corpus), and the target is attributed to the author of the candidate text with the lowest Delta score. A number of literary authorship studies (Burrows, 2002, 2003; Hoover, 2004b, 2004a, 2005) have shown the Delta measure to be exceptionally useful for authorship attribution studies, even with a large number of candidate authors (as long as genre is controlled for). However, while Delta is a powerful new tool in the arsenal of the computational stylist, why it works so well has remained somewhat obscure. As Hoover (2005) states: In spite of the fact that Burrows’s Delta is simple and intuitively reasonable, it, like previous statistical authorship attribution techniques, and like Hoover’s alterations, lacks any compelling theoretical justification. 1