Estimating Number of Citations Using Author Reputation Carlos Castillo, Debora Donato, and Aristides Gionis Yahoo! Research Barcelona C/Ocata 1, 08003 Barcelona Catalunya, SPAIN Abstract. We study the problem of predicting the popularity of items in a dynamic environment in which authors post continuously new items and provide feedback on existing items. This problem can be applied to predict popularity of blog posts, rank photographs in a photo-sharing system, or predict the citations of a scientific article using author infor- mation and monitoring the items of interest for a short period of time after their creation. As a case study, we show how to estimate the number of citations for an academic paper using information about past articles written by the same author(s) of the paper. If we use only the citation information over a short period of time, we obtain a predicted value that has a correlation of r =0.57 with the actual value. This is our baseline prediction. Our best-performing system can improve that prediction by adding features extracted from the past publishing history of its authors, increasing the correlation between the actual and the predicted values to r =0.81. 1 INTRODUCTION Editors in publishing houses (as well as producers for record labels and other industries) face often the following problem: given a work, or a promise of a work, what is a good method to predict if this work is going to be successful? Answering this question can be very useful in order to decide, for instance, whether to buy the rights over the work, or to pay in advance to the authors. The editor’s prediction on the success of the work can, in principle, depend on the past publishing history or credentials of the author, and on the estimated quality of the item that is being examined. Of course, the estimation can be quite inaccurate, as the actual success of an item depends on many elements, including complex interactions among its audience plus external factors that cannot be determined in advance. We are interested in the problem of estimating the success of a given item, understood as the impact in its community. In the case of books, for instance, success can be measured in terms of book sales. In the case of scholarly articles, success is typically measured as a function of the number of citations an article receives over time. In this paper, we deal with the citation prediction task in the context of a large set of academic articles. Our main questions are: