Evaluation of IR Applications with Constrained Real Estate Yuanhua Lv, Ariel Fuxman, and Ashok K. Chandra Microsoft Research, Mountain View, CA USA, 94043 {yuanhual, arielf, achandra}@microsoft.com Abstract. Traditional IR applications assume that there is always enough space (“real estate”) available to display as many results as the system returns. Consequently, traditional evaluation metrics were typically de- signed to take a length cutoﬀ k of the result list as a parameter. For example, one computes DCG@k, Prec@k, etc., based on the top-k re- sults in the ranking list. However, there are important modern ranking applications where the result real estate is constrained to a small ﬁxed space, such as the search verticals aggregated in the Web search results and the recommendation systems. For such applications, the following tradeoﬀ arises: given a ﬁxed amount of real estate, shall we show a small number of results with rich captions and details, or a larger number of results with less informative captions? In other words, there is a tradeoﬀ between the length of the result list (i.e., quantity) and the informative- ness of the results (i.e., quality). This tradeoﬀ has important implications for evaluation metrics, since it leads the length cutoﬀ k hard to be deter- mined a priori. In order to tackle this problem, we propose two desirable formal constraints to capture the heuristics of regulating the quantity- quality tradeoﬀ, inspired by the axiomatic approach to IR. We then present a general method to normalize the well-known Discounted Cu- mulative Gain (DCG) metric for balancing the quantity-quality tradeoﬀ, yielding a new metric, that we call Length-adjusted Discounted Cumula- tive Gain (LDCG). LDCG is shown to be able to automatically balance the length and the informativeness of a ranking list without requiring an explicit parameter k, while still preserving the good properties of DCG. Keywords: Evaluation, Aggregated Search, Constrained Real Estate, Quantity-Quality Tradeoﬀ, LDCG, LNDCG 1 Introduction Evaluation metrics play a critical role in the ﬁeld of information retrieval (IR). Traditional IR applications assume that there is always enough space (“real estate”) available to display as many results as the system returns. To evaluate such systems, traditional evaluation metrics were typically designed to take a length cutoﬀ k of the result list as a parameter. For example, one computes DCG@k [8], Prec@k, etc., based on the top-k results in the ranking list.