Defining the Role of User Input Bias in Personalized Platforms Daniel Trielli and Nicholas Diakopoulos Northwestern University, School of Communication dtrielli@u.northwestern.edu, nad@northwestern.edu Abstract User input bias has relevant impact to algorithmic personali- zation and should be subject to further study. In this paper, we discuss ways in which that user input bias might manifest itself, and how it is dependent on the type of algorithmic plat- form. We also propose avenues of investigation to further study the effect, impact and mitigation of user input bias in algorithmic systems. Defining input bias Bias in information retrieval systems can be generated in many ways. First, the coverage of the data accessible to the system may be deficient, leading to limitations of possible outputs, an effect known as coverage bias. Second, the in- formation retrieval system algorithm might be structured in a way that prioritizes, filters, classifies, and aggregates in- formation in improper and/or unexpected way. And finally, the users themselves might inject biases into the systems by interacting with them in particularly predisposed ways. User input and activity may be the cause of an important portion of bias in information retrieval systems. They also should be considered part of personalization, since these in- puts reflect individual conditions of each user, and result in tailor-made outputs by automated systems. In other words, if personalized news is adapted to the user, don’t we also need to understand how different users may themselves be systematically biased? The idea of user input bias is not new. When discussing biases in computer systems in general, Friedman and Nis- senbaum (1996) mention a particular type of emergent bias, one that (unlike preexisting or technical biases) appears after the systems are deployed and arises from user interaction with the system: "interfaces by design seek to reflect the ca- pacities, character, and habits of prospective users. Thus, a shift in context of use may well create difficulties for a new set of users" (p. 335). With new digital platforms establish- ing more complex personalization algorithms, it is necessary to reevaluate how that human-computer interaction impacts information retrieval. For this article, we are defining as user input bias any ac- tivity from the user that has an impact in how the algorithm Copyright © 2018, Association for the Advancement of Artificial Intelli- gence (www.aaai.org). All rights reserved. retrieves information. In the next section we explore a few of those possible inputs and how they might be motivated. Input bias across digital platforms In this section, we describe how different platforms afford the user with different type of activities and, therefore, are open to different manifestations of input bias. Search engines Among all examples in this section, search engines are the one in which the power of the input of the user is most evi- dent. Without a user query, there is no search, no results, and no bias. That is not to say that other factors do not come into play when search engines look for the most relevant results, such as user geographical location and demographic factors. And there is a wide array of possible investigations about how search algorithms determine relevancy, and the biases associated with that process. But the primary generator of search results – and, therefore, bias – are the queries that are defined by the user. In that regard, investigations into search engine biases must take into account the construction of the query in a search. But a deeper understanding is needed of why users choose the search terms they use and how impactful those decisions are. For instance, what makes a user search for "gun rights" versus "gun control"? And what is the impact of those decisions? It is important to note that in search there is another type of input bias, which is the bias in the corpus of relevant re- sults for a query (Kulshrestha et al. 2017). In other words, results might be skewed because the websites relevant for that particular query are skewed. That is an important factor, but one that also highlights the weight that the query selec- tion itself has in defining the corpus from which results will be extracted. Another user input bias might be generated in the interac- tion of the users with results provided by the search engines. By clicking on specific results on Google, for instance, users generate data that allows engines to predict what type of website is most favored by the user and give preference to those in the next time the user makes a search. Therefore,