The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track Julián Urbano, Mónica Marrero, Diego Martín, Jorge Morato, Karina Robles and Juan Lloréns University Carlos III of Madrid · Department of Computer Science {jurbano, mmarrero, dmandres, jmorato, krobles, llorens}@inf.uc3m.es Abstract. This paper describes the participation of the uc3m team in both tasks of the TREC 2011 Crowdsourcing Track. For the first task we submitted three runs that used Amazon Mechanical Turk: one where workers made relevance judgments based on a 3-point scale, and two similar runs where workers provided an explicit ranking of documents. All three runs implemented a quality control mechanism at the task level based on a simple reading comprehension test. For the second task we also submitted three runs: one with a stepwise execution of the GetAnotherLabel algorithm and two others with a rule-based and a SVM- based model. According to the NIST gold labels, our runs performed very well in both tasks, ranking at the top for most measures. 1 Introduction The TREC 2011 Crowdsourcing Track was designed to investigate how to better use crowdsourcing platforms to evaluate Information Retrieval systems. The Track was divided in two tasks: obtaining topical relevance judgments from individual workers and computing consensus judgments from several workers. The Knowledge Reuse research group at the University Carlos III of Madrid put together a team of six people to participate in both tasks. We submitted three runs for each task, although most of our work was devoted to the first one. We focused on designing a practical and effective task template for gathering unconventional relevance judgments through Amazon Mechanical Turk, while studying quality control mechanisms at the task level for such heterogeneous documents as arbitrary HTML pages from the Web. In the second task, two of our runs used a rule-based and an SVM Machine Learning model, while the other one followed a stepwise execution of the GetAnotherLabel algorithm by Ipeirotis et al. [2010]. The rest of the paper is organized as follows. Section 2 describes our submissions for the first task, detailing the HIT design, document preprocessing and quality control mechanism. Section 3 describes our submissions for the second task, and Section 4 summarizes the results in both tasks. Section 5 concludes with final remarks and lines for further work. 2 Task I: Crowdsourcing Individual Judgments For the first task we submitted three runs (see Table 1), all of which used Amazon Mechanical Turk (AMT) as the crowdsourcing platform. The task was implemented with external HITs, that is, we hosted the templates and data in our own server, communicating with AMT via the API. This allowed us to have more control over the whole process, besides the possibility of gathering some additional data such as knowing when and for how long workers previewed our HITs or where they came from. Run uc3m.graded uc3m.slider uc3m.hterms Uploaded Sep 12 th , 19:22 CEST Sep 13 th , 17:48 CEST Sep 14 th , 18:44 CEST Hours to complete 8.5 38 20.5 HITs submitted (overhead) 438 (+1%) 530 (+22%) 448 (+3%) Workers who submitted (just previewers) 29 (80) 86 (354) 33 (175) Average judgments per worker 76 32 75 Cost (fees) 1 $87 ($8.7) $87 ($8.7) $87 ($8.7) Table 1. Summary of the runs submitted for Task I. 1 This is the total cost if rejected work were not paid. See Section 2.4.4.