Diversity from Emojis and Keywords in Social Media
Melanie Swartz
George Mason University,
Department of Computational and
Data Sciences, Fairfax, VA
mswartz2@gmu.edu
Andrew Crooks
George Mason University,
Department of Computational and
Data Sciences, Fairfax, VA
acrooks2@gmu.edu
William G. Kennedy
George Mason University,
Department of Computational and
Data Sciences, Fairfax, VA
wkennedy@gmu.edu
ABSTRACT
Social media is a popular source for political communication and
user engagement around social and political issues. While the di-
versity of the population participating in social and political events
in person are often considered for social science research, mea-
suring the diversity representation within online communities is
not a common part of social media analysis. This paper attempts
to fll that gap and presents a methodology for labeling and an-
alyzing diversity in a social media sample based on emojis and
keywords associated with gender, skin tone, sexual orientation,
religion, and political ideology. We analyze the trends of diversity
related themes and the diversity of users engaging in the online
political community during the leadup to the 2018 U.S. midterm
elections. Our results reveal patterns along diversity themes that
otherwise would have been lost in the volume of content. Further,
the diversity composition of our sample of online users rallying
around political campaigns was similar to those measured in exit
polls on election day. The diversity language model and methodol-
ogy for diversity analysis presented in this paper can be adapted to
other languages and applied to other research domains to provide
social media researchers a valuable lens to identify the diversity of
voices and topics of interest for the less-represented populations
participating in an online social community.
CCS CONCEPTS
· Human-centered computing → Collaborative and social com-
puting; · Social and professional topics → User characteristics;
· Applied computing → Law, social and behavioral sciences.
KEYWORDS
Social media, emoji, diversity, elections, political campaigns
ACM Reference Format:
Melanie Swartz, Andrew Crooks, and William G. Kennedy. 2020. Diversity
from Emojis and Keywords in Social Media. In International Conference
on Social Media and Society (SMSociety ’20), July 22–24, 2020, Toronto, ON,
Canada. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3400806.
3400818
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
SMSociety ’20, July 22–24, 2020, Toronto, ON, Canada
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7688-4/20/07. . . $15.00
https://doi.org/10.1145/3400806.3400818
1 INTRODUCTION
Social media studies provide insights on themes contained within
social media content and user interactions across a variety of topics
including, for example, natural disasters [10], vaccinations [45], and
politics [36]. While social media analysis has been used to study
a variety of social and political issues, there has been less atten-
tion given to measuring the diversity represented by the users and
content within the social media sample for these various studies. Ap-
plying a diversity lens to social media analysis enables researchers
to better understand the diversity representation of the population
being studied as well as to identify diversity related themes within
the social media content. This is particularly important with re-
spect to social media and politics. In an era when news and political
leaders are using social media to deliver their messages [3, 34] and
political groups use social media to rally support or engagement
[39, 44], it has never been more important to ensure that the di-
verse population of a nation is being reached and the voices of less
represented populations in online social-political communities are
not lost in the noise [20, 22].
To understand the political landscape of a country, including the
concerns of the population and composition of political parties, tra-
ditional research methods are popular because they are designed to
be rigorous, targeted, statistically valid, and typically representative
of the diverse populations interviewed or surveyed [19, 31]. With
social media now comprising a large part of political activity and
campaigning [7, 37, 44], these formal survey methods may not ade-
quately capture or account for the topics and concerns expressed
in less formal styles and behaviors of communication (e.g. slang,
emotion, sarcasm, gestures) in social media [16, 17]. Studying social
media presents its own set of opportunities and challenges [18, 38].
Many approaches that study the diversity and demographics of
social media users rely on location-based information associated
with where content is posted [15, 35]. Often researchers will in-
fer demographics and diversity attributes of users based on the
location of the user’s profle or content and compare it with other
locational datasets for the same geographic area such as census or
voter statistics aggregated at varying scales of geography [2, 18, 31].
Using location from social media content relies on the provider of
the platform as well as the individual user settings. Accuracy and
precision of this location information varies greatly and currently
ranges from precise coordinates to broad geographic areas such
as a city or country [29]. However, as the availability of precise
geolocation information varies substantially across platforms and
is becoming less available due to privacy concerns [11], alternative
approaches are needed to explore diversity within social media
communities and datasets.
92