What does Twitter Measure? Influence of Diverse User Groups in Altmetrics Simon Barthel, Sascha Tönnies, Benjamin Köhncke, Patrick Siehndel, Wolf-Tilo Balke L3S Research Center Hannover, Germany {barthel, toennies, koehncke, siehndel, balke}@l3s.de ABSTRACT The most important goal for digital libraries is to ensure high quality search experience for all kinds of users. To attain this goal, it is necessary to have as much relevant metadata as possible at hand to assess the quality of pub- lications. Recently, a new group of metrics appeared, that has the potential to raise the quality of publication meta- data to the next level – the altmetrics. These metrics try to reflect the impact of publications within the social web. However, currently it is still unclear if and how altmetrics should be used to assess the quality of a publication and how altmetrics are related to classical bibliographical met- rics (like e.g. citations). To gain more insights about what kind of concepts are reflected by altmetrics, we conducted an in-depth analysis on a real world dataset crawled from the Public Library of Science (PLOS). Especially, we analyzed if the common approach to regard the users in the social web as one homogeneous group is sensible or if users need to be divided into diverse groups in order to receive meaningful results. Categories and Subject Descriptors H.3.7 [Information Systems]: Digital Libraries—Standards ; H.3.3 [Information Systems]: Information Search and Re- trieval Keywords Altmetrics, Twitter, Correlation Analysis, Social Media, Ex- pert Mining 1 INTRODUCTION The most important goal for digital libraries is to ensure a high quality search experience for the user. One central aspect to reach this goal is the assessment of the impact and quality of scientific publications, which is of course far from being trivial. In the scientific field this judgment is mostly performed with respect to the reputation of the pub- lication venue and the number of citations the publication has. The reputation of a researcher is analogously assessed by scanning publication venues and number of citations in Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. JCDL ’15 June 21 - 25, 2015, Knoxville, TN, USA c 2015 ACM. ISBN 978-1-4503-3594-2/15/06...$15.00 DOI: http://dx.doi.org/10.1145/2756406.2756913 the researcher’s publication list. Since the 1960s with the release of the science citation index [1] several metrics were introduced to measure the success of publications, researchers or even whole journals in a deter- ministic way. Famous examples that are based on this index are e.g. the impact factor [2] and the h-index [3]. However, nowadays, the practicability of such metrics seems more and more questionable due to increasing agility in scientific pro- gresses and communication. When relying on citation based metrics, this agility is impossible to provide since citation counts take several years to become stable [4]. When on the other hand observing the impact of an arti- cle within the social Web like bookmarking services, micro blogging platforms or social networks, reactions can be de- tected immediately after the date of publication [5], [6], [7]. Also this reaction originates from a much more diverse set of users in contrast to the science citation index, where only citations by peers can be regarded. The general meaningful- ness and the basic ideas of altmetrics [8], [9], [10], [11] has been confirmed by empirical studies [12], [13] and therefore the usage of these measures is continuously increasing. This is demonstrated by the progression of first Web 2.0 tools, e.g. PlumX 1 and Altmetric 2 and the implementation by several information providers, e.g. PLOS 3 and Nature 4 . The main problem regarding altmetrics is however that it is still not clear what general conclusions can be drawn when an article is frequently mentioned within the social web. It is also not clear how altmetrics are related to classical bibli- ographical metrics like e.g. citations. It is certain however, that this relationship is not trivial like e.g. more tweets mean more citations. This becomes clear when comparing the tweeting behavior within different communities. For ex- ample, the average social science article in our corpus is mentioned 16 times on twitter while the average chemistry article is only mentioned 2.5 times. To judge, whether a certain number of tweets is “high” therefore always depends on the context. The problem to associate altmetrics to citation counts has been studied a lot recently. This is an interesting prob- lem since citation counts are currently the best naive esti- mate of scientific quality. If a connection between citation counts and altmetrics existed, this connection could be used to judge the scientific quality of an article much faster as it were possible with citation counts. Of course, before such concepts were introduced in practice several problems have 1 http://www.plumanalytics.com 2 http://www.altmetric.com 3 http://article-level-metrics.plos.org/ 4 http://www.nature.com/