Balancing Exposure and Relevance in Academic Search Andres Ferraro Universitat Pompeu Fabra Barcelona, Spain andres.ferraro@upf.edu Lorenzo Porcaro Universitat Pompeu Fabra Barcelona, Spain lorenzo.porcaro@upf.edu Xavier Serra Universitat Pompeu Fabra Barcelona, Spain ABSTRACT The TREC 2020 Fair Ranking Track focuses on the evaluation of retrieval systems according to how well they fairly rank academic papers. The evaluation metric considered estimates how much the ranked papers are relevant and fairly represent diferent groups of authors, groups unknown to the track’s participants. In this paper, we present the three diferent solutions proposed by our team to the given problem. The frst solution is built on a learning-to-rank model to predict how much the documents are relevant for a given query and modify the ranking based on this relevance and a random- ization strategy. The second approach is also based on the relevance predicted by a learning-to-rank model, but it additionally selects the authors using categories defned by analyzing collaborations between authors. The third approach uses the DELTR framework, and it considers diferent categories of authors based on the corre- sponding H-class. The results show that the frst approach gives the best performance, with the additional advantage that it does not require extra information about the authors. KEYWORDS Information retrieval, TREC 2020 fair ranking track, exposure, fair- ness 1 INTRODUCTION In the feld of information retrieval, the area of fair ranking focuses on minimizing the disparity in items’ exposure while at the same time giving relevant results. However, fairness is not a concept gen- eralizable and universally accepted in a univocal way. In the context of the Text Retrieval Conference (TREC) 2020, the specifc goal of the Fair Ranking Track was to provide fair exposure to diferent groups of authors in an academic search task. For accomplishing it, we explored diferent ways to consider the trade-of between the relevance of the results and group exposure leading to diverse outcomes. First, we used a learning-to-rank model combined with randomization (ğ2.1), to ensure that results with the same relevance are likely to be ranked similarly. Second, on top of the previous logic, we build a group classifcation of authors based on their col- laborations, making use of the authors’ network. We eventually used such classifcation for re-ranking the results obtained through the relevance-based model (ğ2.2). Third, we consider the DELTR framework proposed in [5], using the authors’ H-class as the main feature for training the learning-to-rank model (ğ2.3). 2 PROPOSED MODELS We proposed three diferent alternatives for the reranking problem in TREC 2020 Fair Ranking Track. In this Section, we describe the proposed solutions. The evaluation metric has been provided by the organizers of the track 1 , and it is expressed by the formula [4]: = =1 -1 -1 =1 ( 1 - (| )) ( ∈D ) (1) where : number of documents in ranking •D : documents including as an author : document a position : continuation probability (fxed to o for the fnal position in the ranking) (|) : probability of stopping given user examined d The fnal expected exposure for a is the sum over all the impres- sions for a query, Π : = Π (2) 2.1 Relevance Our frst approach is based on the solution proposed by Bonant 2 [2] to the 2019 Fair TREC. We frst train a ranking model [3] using features extracted from the provided data, including the length of the query and the following information for each publication: title, abstract, entities, venue, journal, authors’ names, out-citations and in-citations. Once the ranking model is trained, for a given query of length , we compute the relevance score predicted by the model for each item ( ˆ ). Then, we sort the items according to the predicted relevance and compute the mean diference ( ) following the Equa- tion 3. = 1 · =2 ˆ - ˆ -1 (3) Since the goal of this approach is that items with the same rele- vance will have the same probability to be in a position, we apply a randomization to the results based on the predicted relevance. The new relevance ( ˜ ) is computed following the Equation 4. ˜ = ( 0, )+ ˆ (4) We then obtain the fnal ranking by sorting the items using the new relevance ˜ . 1 https://fair-trec.github.io/2020/doc/guidelines-2020.pdf 2 Code available at https://github.com/irgroup/fair-trec