Pictures of Relevance: A Geometric Analysis of Similarity Measures William P. Jones Microelectronics and Computer Technology Corporation, P.O. Box 200195, Austin, Texas 78720 George W. Furnas Bell Communications Research, 435 South Street, Morristown, N.J. 07960 We want computer systems that can help us assess the similarity or relevance of existing objects (e.g., docu- ments, functions, commands, etc.) to a statement of our current needs (e.g., the query). Towards this end, a vari- ety of similarity measures have been proposed. How- ever, the relationship between a measure’s formula and its performance is not always obvious. A geometric analy sis is advanced and its utility demonstrated through its application to six conventional information retrieval similarity measures and a seventh spreading activation measure. All seven similarity measures work with a rep- resentational scheme wherein a query and the database objects are represented as vectors of term weights. A geometric analysis characterizes each similarity mea- sure by the nature of its iso-similarity contours in an n-space containing query and object vectors. This analy- sis reveals important differences among the similarity measures and suggests conditions in which these dif- ferences will affect retrieval performance. The cosine coefficient, for example, is shown to be insensitive to between-document differences in the magnitude of term weights while the inner product measure is sometimes overly affected by such differences. The context- sensitive spreading activation measure may overcome both of these limitations and deserves further study. The geometric analysis is intended to complement, and perhaps to guide, the empirical analysis of similarity measures. 1. Introduction The success of any information retrieval system depends upon its ability to accurately assess the relevance of objects (e.g., information units, documents, functions, commands, etc.) in its database to a given user’s request. Towards this end, components of a ranking algorithm include a means of representing the user’s request, a means of representing the objects of the database, and a means of measuring the similarity between these representations [I]. This article Received September 4, 1986; revised May 29, 1987; accepted March 10, 1987. 0 1987 by John Wiley & Sons, Inc. focuses on the last of these components, the ranking algo- rithm’s similari measure. ’ A wide range of similarity measures has been proposed (e.g., [l-4]) and the number of potential similarity mea- sures is larger still [ 11. Yet, surprisingly little is known about the comparative utility of these measures. Through a combined use of formal, algebraic manipulation and empir- ical simulation, McGill, et al. [3] were able to reduce a set of 67 similarity measures to a set of 24 classes. The mem- bers of a class were judged either to be algebraically equiv- alent to one another or else to exhibit very high correlation in their judgments of similarity. However, the results of an experimental comparison of measures from various classes were inconclusive. McGill, et al. [3] (pp. lo-19), acknowledge a number of difficult methodological problems that severely limit our ability to generalize from empirical studies of similarity measure performance. A number of problems arise, for ex- ample, in the attempt to judge the precision and recall rate of a similarity measure (see [4] pp. 157-191). Moreover, no means has yet been proposed to systematically and eco- nomically manipulate various important factors of the infor- mation retrieval task including: the nature of the database and its size, the nature of the user population and its infor- mation needs, the procedures for arriving at representations of database objects and user requests. It is quite possible that these factors interact with the choice of similarity measure so that a performance ordering of similarity measures may change depending upon the choices that are necessarily made with respect to each of these factors. It is also possible that important differences among similarity measures are “averaged away” in studies that allow these factors to vary in an uncontrolled fashion. ‘The phrase “relevance measure” is perhaps a more accurate label for this component to a ranking algorithm since its primary purpose is to assess an object’s relevance to a user’s request. The computer’s representation of an object could conceivably bear very little resemblance to its representa- tion of a user’s request and yet still be very relevant. “Similarity measure” is used instead throughout this article to maintain consistency with its usage in the information retrieval literature. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 38(6): 420-442, 1987 CCC 0002-8231187/060420-23$04.00