Diversity from Emojis and Keywords in Social Media Melanie Swartz George Mason University, Department of Computational and Data Sciences, Fairfax, VA mswartz2@gmu.edu Andrew Crooks George Mason University, Department of Computational and Data Sciences, Fairfax, VA acrooks2@gmu.edu William G. Kennedy George Mason University, Department of Computational and Data Sciences, Fairfax, VA wkennedy@gmu.edu ABSTRACT Social media is a popular source for political communication and user engagement around social and political issues. While the di- versity of the population participating in social and political events in person are often considered for social science research, mea- suring the diversity representation within online communities is not a common part of social media analysis. This paper attempts to fll that gap and presents a methodology for labeling and an- alyzing diversity in a social media sample based on emojis and keywords associated with gender, skin tone, sexual orientation, religion, and political ideology. We analyze the trends of diversity related themes and the diversity of users engaging in the online political community during the leadup to the 2018 U.S. midterm elections. Our results reveal patterns along diversity themes that otherwise would have been lost in the volume of content. Further, the diversity composition of our sample of online users rallying around political campaigns was similar to those measured in exit polls on election day. The diversity language model and methodol- ogy for diversity analysis presented in this paper can be adapted to other languages and applied to other research domains to provide social media researchers a valuable lens to identify the diversity of voices and topics of interest for the less-represented populations participating in an online social community. CCS CONCEPTS · Human-centered computing Collaborative and social com- puting; · Social and professional topics User characteristics; · Applied computing Law, social and behavioral sciences. KEYWORDS Social media, emoji, diversity, elections, political campaigns ACM Reference Format: Melanie Swartz, Andrew Crooks, and William G. Kennedy. 2020. Diversity from Emojis and Keywords in Social Media. In International Conference on Social Media and Society (SMSociety ’20), July 22–24, 2020, Toronto, ON, Canada. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3400806. 3400818 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. SMSociety ’20, July 22–24, 2020, Toronto, ON, Canada © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7688-4/20/07. . . $15.00 https://doi.org/10.1145/3400806.3400818 1 INTRODUCTION Social media studies provide insights on themes contained within social media content and user interactions across a variety of topics including, for example, natural disasters [10], vaccinations [45], and politics [36]. While social media analysis has been used to study a variety of social and political issues, there has been less atten- tion given to measuring the diversity represented by the users and content within the social media sample for these various studies. Ap- plying a diversity lens to social media analysis enables researchers to better understand the diversity representation of the population being studied as well as to identify diversity related themes within the social media content. This is particularly important with re- spect to social media and politics. In an era when news and political leaders are using social media to deliver their messages [3, 34] and political groups use social media to rally support or engagement [39, 44], it has never been more important to ensure that the di- verse population of a nation is being reached and the voices of less represented populations in online social-political communities are not lost in the noise [20, 22]. To understand the political landscape of a country, including the concerns of the population and composition of political parties, tra- ditional research methods are popular because they are designed to be rigorous, targeted, statistically valid, and typically representative of the diverse populations interviewed or surveyed [19, 31]. With social media now comprising a large part of political activity and campaigning [7, 37, 44], these formal survey methods may not ade- quately capture or account for the topics and concerns expressed in less formal styles and behaviors of communication (e.g. slang, emotion, sarcasm, gestures) in social media [16, 17]. Studying social media presents its own set of opportunities and challenges [18, 38]. Many approaches that study the diversity and demographics of social media users rely on location-based information associated with where content is posted [15, 35]. Often researchers will in- fer demographics and diversity attributes of users based on the location of the user’s profle or content and compare it with other locational datasets for the same geographic area such as census or voter statistics aggregated at varying scales of geography [2, 18, 31]. Using location from social media content relies on the provider of the platform as well as the individual user settings. Accuracy and precision of this location information varies greatly and currently ranges from precise coordinates to broad geographic areas such as a city or country [29]. However, as the availability of precise geolocation information varies substantially across platforms and is becoming less available due to privacy concerns [11], alternative approaches are needed to explore diversity within social media communities and datasets. 92