A generic ranking function discovery framework by genetic programming for information retrieval q Weiguo Fan a, * , Michael D. Gordon b , Praveen Pathak c a Department of Accounting and Information Systems, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA b Department of Computer and Information Systems, University of Michigan, Ann Arbor, MI 48105, USA c Department of Decision and Information Sciences, University of Florida, Gainesville, FL 32611, USA Received 31 January 2003; accepted 8 August 2003 Abstract Ranking functions play a substantial role in the performance of information retrieval (IR) systems and search engines. Although there are many ranking functions available in the IR literature, various empirical evaluation studies show that ranking functions do not perform consistently well across different contexts (queries, collections, users). Moreover, it is often difficult and very expensive for human beings to design optimal ranking functions that work well in all these contexts. In this paper, we propose a novel ranking function discovery framework based on Genetic Programming and show through various experiments how this new framework helps automate the ranking function design/discovery process. Ó 2003 Published by Elsevier Ltd. Keywords: Information retrieval; Ranking function; Genetic algorithms; Genetic programming; Text mining 1. Introduction The information retrieval (IR) field is undergoing dramatic development and change due to advances in information technology and computation techniques. The large amount of digital information increasingly available in our society makes information retrieval research one of the q An earlier version of this paper was presented at the 2000 International Conference on Information Systems by Fan, Gordon, and Pathak (2000). * Corresponding author. E-mail addresses: wfan@vt.edu (W. Fan), mdgordon@umich.edu (M.D. Gordon), praveen@ufl.edu (P. Pathak). 0306-4573/$ - see front matter Ó 2003 Published by Elsevier Ltd. doi:10.1016/j.ipm.2003.08.001 Information Processing and Management xxx (2003) xxx–xxx www.elsevier.com/locate/infoproman ARTICLE IN PRESS