Journal of Informetrics 11 (2017) 989–1002 Contents lists available at ScienceDirect Journal of Informetrics j ourna l h o mepa ge: www.elsevier.com/locate/joi Regular article How is R cited in research outputs? Structure, impacts, and citation standard Kai Li * , Erjia Yan, Yuanyuan Feng College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, United States a r t i c l e i n f o Article history: Received 2 February 2017 Received in revised form 9 August 2017 Accepted 9 August 2017 Keyword: R Software citation Content analysis Bibliometrics Scholarly communication a b s t r a c t This paper addresses software citation by analyzing how R and its packages are cited in a sample of PLoS papers. A codebook is developed to support a content analysis of the full- text papers. Our results indicate that the software R and its packages are inconsistently cited, as is the case with other scientiﬁc software. The inconsistency derives partly from the variety of citation standards currently used for software, and partly from fact that these standards are not well followed by authors on multiple levels. This work sheds light on the future development of software citation standards, especially given the present landscape of conﬂicting citation practices. Moreover, our approach furnishes a possible blueprint for dealing with the granularity of software entities in scientiﬁc citation: we consider citations of the core R software environment, of speciﬁc R packages, and of individual functions. © 2017 Elsevier Ltd. All rights reserved. 1. Introduction This paper concerns the citation and mention of scientiﬁc software in research papers. It is commonly accepted that proper citation of research datasets is important to the growing ﬁeld of data science, because it provides a basic mechanism for linking datasets to other scientiﬁc entities. This linkage provides a fundamental infrastructure for other scientiﬁc tasks, including data sharing, data reuse, and reproducible research (e.g., Mooney & Newton, 2012). As data objects in their own right, scientiﬁc software environments, applications, and packages are also amenable to this infrastructure of citation, with similar beneﬁts for researchers. Earlier evidence suggests that data citation practices are highly inconsistent, largely because authors lack standards and policies to guide them in citing datasets (Belter, 2014; Mooney, 2011; Mooney & Newton, 2012). These ﬁndings have inspired the development of several data citation standards (e.g., Altman & King, 2007; Starr & Gastl, 2011) and the identiﬁcation of key principles for citing datasets (Altman, Borgman, & Crosas, 2015; Altman & Crosas, 2013; Martone, 2014). Parallel developments are underway in the study of software citation: researchers have provided some early analysis of software citation trends (e.g., Howison & Bullard, 2015; Li, Greenberg, & Lin, 2016; Pan, Yan, Wang, & Hua, 2015) and have proposed guiding principles for future works (Smith, Katz, & Niemeyer, 2016). However, several factors limit the quality of information obtainable from current software citation practices. For example, existing studies tend to describe software as a single unit in the scientiﬁc workﬂow; this does not always reﬂect the way in the software is used. Increasingly, scientiﬁc software is extensible. Extensibility is one of the most important principles of software design (Johansson & Löfgren, 2009). R, a statistical and data visualization software and environment, represents a successful example * Corresponding author. E-mail address: kl696@drexel.edu (K. Li). http://dx.doi.org/10.1016/j.joi.2017.08.003 1751-1577/© 2017 Elsevier Ltd. All rights reserved.