Distance Matters: An Exploratory Analysis of the Linguistic Features of Flickr Photo Tag Metadata in Relation to Impression Management Syed Ishtiaque Ahmed Department of Information Science Cornell University 301 College Avenue Ithaca, NY, US sa738@cornell.edu Shion Guha Department of Information Science Cornell University 301 College Avenue Ithaca, NY, US sg648@cornell.edu ABSTRACT Tags are words that users add to shared multimedia contents as metadata to facilitate better categorization and improved sharing experiences. With the burgeoning growth of shared images and videos over online social networks, a huge num- ber of tags is being populated everyday in public or shared databases. While one major reason for tagging a photo or a video incorporates the functional needs for the organization of that shared object, people also use tags as a medium of communication for conveying their emotions to their family, friends, and other contacts. The diversity in the linguistic features of these tags demonstrates some interesting pat- terns that reflect different facets of human nature in manag- ing their online impression to their social peers. This paper investigates how some linguistic features of tags associated with the Flickr photos change with the distance between the user’s home location and the location where the photo is taken. In our exploratory analysis “affective” and “relative” words and their multiplicative interaction show correlations with this distance. These initial findings help us to have a better understanding of online social phenomena related to the expression of emotions and sharing information. At the same time, this might have some indirect implications to understand the insight of impression management in online communities. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing - Dictionaries, Indexing methods, Linguistic processing. General Terms Measurement, Human Factors, Languages. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DBSocial ’12 Scottsdale, AZ USA Copyright 2012 ACM 978-1-4503-1495-4 ...$10.00. Keywords Geotag, Flickr, Linguistic Analysis, Impression Management 1. INTRODUCTION Flickr is an online photo and video sharing network owned by Yahoo! Incorporation [9]. It was launched in February 2004. Although we observe an increasing interest among Flickr users in sharing videos, we limit our interest in Flickr photos in this study. According to Wikipedia, Yahoo re- ported in June 2011 that Flickr had a total of 51 million registered members and 80 million unique visitors. In Au- gust 2011 the site reported that it was hosting more than 6 billion images and this number continues to grow steadily according to reporting sources [2]. For the ease in retrieving photos owned by a user and for communicating to the audi- ence, tagging photos and videos with different keywords or arbitrary texts was allowed in the Flickr users. In August 2006, Flickr allowed the users to geotag the photos, and it opened up the opportunity to add the spatial information (such as: longitude, latitude, etc.) to a photo. Since then, a big portion of Flickr photos has been geotagged. With the invention of smart mobile devices with embedded GPS unit and camera, it has become very easy for the users to take photos and share those to social networks with geotags. As a result, a huge number of geotagged photos are being added regularly to Flickr. Before going into detail, we define two classes of words, which will be frequently used in this paper. This classifica- tion has been made by the linguistic features of the words and is widely used for linguistic analyses. (1) Relative words are the words, which are related to space, time and mo- tion (Example: day, walk, with). (2) Affect words are the words that are related to emotion. (Example: joy, love, sad). LIWC dictionary has a detailed list of different classes of words [15]. Now we come back to photo-tagging behav- ior. Two main reasons for tagging photos are: (1) to classify the photos for helping search engines, and (2) to make the photos easily retrievable and understandable to the users’ friends, families, different other communities or to the pub- lic [4, 17]. Nonetheless, users demonstrate a high level of diversity in their behavior of tagging photos, which has been discussed in the section 2 of this paper. However, from the social psychological literature [12, 7, 6, 11, 10, 18, 5], we understand that the diversity in tagging is a consequence of the users’ self-representations inside their online communi-