Ranked Recall: Efficient Classification by Learning Indices that Rank Omid Madani Yahoo! Research 3333 Empire Ave Burbank, CA 91504, USA madani@yahoo-inc.com Michael Connor * Department of Computer Science University of Illinois at Urbana-Champaign N. Goodwin Ave, Urbana, IL 61801, USA connor2@uiuc.edu A fundamental activity of intelligence is to efficiently detect to which of myr- iad categories a given entity belongs. The problem occurs in many incarnations and applications, including: (1) categorizing web pages into the Yahoo! topic hier- archy (http://dir.yahoo.com) [MGKS07, LYW + 05], (2) prediction problems [Mad06, Goo01, EZR00], and (3) determining the visual categories for image tagging and object recognition [WLW01, FP03]. Furthermore, ideally we desire systems that ef- ficiently learn to efficiently classify. In particular, we would like to ensure that both learning of categories and categorization of items be efficient in their usage of time and space. However, these tasks present a number of challenges for learning: Large or practically unbounded training sets. Large dimensionalities (thousands and beyond). Large numbers of categories (thousands and beyond). In this work, we explore an approach based on learning an index of features into the categories. An index is a sparse weighted bipartite graph that connects each feature to zero or more categories. During classification, given an instance, the index is looked up much like a typical inverted index for document retrieval would be: active features of the instance are used for the index look up, and categories are retrieved and ranked by the scores that they obtain during retrieval. We term this process ranked recall (of categories). The ranking and the category scores can then be used for category assignment. We design our online algorithms to efficiently learn an index that accurately and efficiently ranks. We compare against one-versus-rest and top-down (hierarchical) * Portion of this research performed while the author was at Yahoo! Research. 1