Investigating “Who” in the Crowdsourcing of News Credibility Md Momen Bhuiyan Virginia Tech momen@vt.edu Amy X. Zhang University of Washington axz@cs.uw.edu Connie Moon Sehat Hacks/Hackers connie@hackshackers.com Tanushree Mitra Virginia Tech tmitra@vt.edu ABSTRACT Concerns about the spread of misinformation online via news articles have led to the development of many tools and processes involving human annotation of their credibility. However, much is still unknown about how different people judge news credibility or the quality or reliability of news credibility ratings from populations of varying expertise. In this work, we consider credibility ratings from two “crowd” populations: 1) students within journalism or media pro- grams, and 2) crowd workers on UpWork, and compare them with the ratings of two sets of experts: journalists and climate scientists, on a set of 50 climate-science articles. We find that both groups’ credibility ratings have higher correlation to journalism experts compared to the science experts, with 10-15 raters to achieve convergence. We also find that raters’ gender and political leaning impact their ratings. Among article genre of news/opinion/analysis and article source leaning of left/center/right, crowd ratings were more similar to experts respectively with opinion and strong left sources. KEYWORDS credibility, science news, crowdsourcing, misinformation ACM Reference Format: Md Momen Bhuiyan, Amy X. Zhang, Connie Moon Sehat, and Tanushree Mitra. 2020. Investigating “Who” in the Crowdsourcing of News Credibility. In Proceedings of Computation+Journalism Symposium (C+J’20). ACM, New York, NY, USA, 5 pages. https://doi.org/10. 1145/xxx 1 INTRODUCTION Misinformation—or information that is false or misleading— can quickly reach thousands to millions of readers via online social and search platforms, helped by inattentive or mali- cious sharers and algorithms optimized for engagement. In recent years, platforms and third party organizations have developed tools and processes for people to label the credi- bility of news articles to slow the spread of misinformation. Some initiatives include Facebook’s fact-checking pro- gram and Climate Feedback’s use of domain experts. How- ever, expert feedback is hard to scale. Other initiatives such as TruthSquad, FactcheckEU, and WikiTribune have pursued a lower-barrier crowdsourced approach, which sometimes run into issues with quality; workarounds include final judg- ments by experts or delegating primary research to experts and secondary tasks to the crowd [3]. Efforts to automate fact-checking still require human judgment and advances in understanding the crowd labeling of data [1]. In this work, we delve more deeply into the notion of “crowd” and “expert” by examining the article credibility ratings of two populations with different backgrounds — journalism students and UpWork workers— and compare their ratings with those of two different forms of expertise: journalistic and scientific. We also looked at how personal traits and article genre may have related to the ratings. Our articles set of 50 stories about climate science were annotated by 49 students, 26 Upwork workers, 3 science and 3 journalism experts. Analyses reveal that crowd annotators’ perception of the credibility of the articles has higher correla- tion to journalism experts’ ratings than science ones. Among personal attributes, less educated and non-Democrat char- acteristics lead to higher error. Genre-wise crowd groups are more accurate with the experts on opinion articles and articles from strong left-leaning sources. From this work, we gain a deeper understanding of the conditions under which crowdsourced annotations might serve as a proxy for reliable expert knowledge, specifically learning more about "who" in terms of the annotation crowd and in addition, how article genre may play a role. 2 RELATED WORK Much has been made about the “wisdom of crowds” but it is still unclear whether crowdsourcing can be an effec- tive strategy for assessing misinformation at larger scales. Partly this has to do with the limits of crowds on certain topics. It is accepted that collective wisdom can be better than an individual’s judgment, including those of individ- ual experts [17]. However, there are situations in which the collective is a lot worse because they do not have enough relevant information, suggesting a baseline expertise in the crowd is necessary [16]. Traits related to crowd diversity and their ability to preserve some amount of independent decision-making have been shown important, along with size; in addition to the suitability of the raters themselves, task difficulty also plays a part [9, 13, 18]. The key question is not whether crowdsourcing is a viable approach but ex- actly how—what set of parameters unlocks “wisdom of select crowds” [9]? 1