The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet Tim Robertson 1 , Markus Do ¨ ring 1 , Robert Guralnick 2 , David Bloom 3 *, John Wieczorek 3 , Kyle Braak 1 , Javier Otegui 2 , Laura Russell 4 , Peter Desmet 5 1 Global Biodiversity Information Facility, Copenhagen, Denmark, 2 University of Colorado, Boulder, Colorado, United States of America, 3 University of California, Berkeley, Berkeley, California, United States of America, 4 University of Kansas, Lawrence, Kansas, United States of America, 5 Research Institute for Nature and Forest (INBO), Brussels, Belgium Abstract The planet is experiencing an ongoing global biodiversity crisis. Measuring the magnitude and rate of change more effectively requires access to organized, easily discoverable, and digitally-formatted biodiversity data, both legacy and new, from across the globe. Assembling this coherent digital representation of biodiversity requires the integration of data that have historically been analog, dispersed, and heterogeneous. The Integrated Publishing Toolkit (IPT) is a software package developed to support biodiversity dataset publication in a common format. The IPT’s two primary functions are to 1) encode existing species occurrence datasets and checklists, such as records from natural history collections or observations, in the Darwin Core standard to enhance interoperability of data, and 2) publish and archive data and metadata for broad use in a Darwin Core Archive, a set of files following a standard format. Here we discuss the key need for the IPT, how it has developed in response to community input, and how it continues to evolve to streamline and enhance the interoperability, discoverability, and mobilization of new data types beyond basic Darwin Core records. We close with a discussion how IPT has impacted the biodiversity research community, how it enhances data publishing in more traditional journal venues, along with new features implemented in the latest version of the IPT, and future plans for more enhancements. Citation: Robertson T, Do ¨ ring M, Guralnick R, Bloom D, Wieczorek J, et al. (2014) The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet. PLoS ONE 9(8): e102623. doi:10.1371/journal.pone.0102623 Editor: Damon P. Little, The New York Botanical Garden, United States of America Received October 9, 2013; Accepted June 20, 2014; Published August 6, 2014 Copyright: ß 2014 Roberston et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: All funding for the development and maintenance of the Integrated Publishing Toolkit was/is provided by the Global Biodiversity Information Facility Work Program. All contributions have been commissioned by GBIF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * Email: dbloom@vertnet.org Introduction Natural history collection records and data collected in citizen science efforts represent irreplaceable information about our biosphere. The value of these legacy data sources will increase as landscape and climate change accelerates and species-environ- ment steady-state conditions decline [1]. In order for biocollections to be utilized to their full potential, there must be widespread access to the data they contain [2–3]. Many natural history collections, however, still struggle to mobilize data [4] and neither scientists nor the public have sufficient access to these resources. Mobilizing biodiversity data en masse in ways that maximize open access and reuse require a robust and easily usable infrastructure. Wieczorek et al. [5] discuss the need for data to be made accessible, discoverable, and integrated, and further relate challenges to each of these endeavors. Integration can, in part, be achieved through the utilization of community-developed metadata standards such as Darwin Core [5]. Darwin Core is a vocabulary, or set of terms, that describe biodiversity data. These terms, comprising the Darwin Core standard (http://rs.tdwg.org/ dwc/), have been vetted rigorously for utility by the biodiversity research community and are maintained through a well-defined governance process (http://www.tdwg.org/about-tdwg/process/). A community standard helps to set the stage for interoperability and enhanced data discovery, but it is only one step in the larger process of data mobilization. Equally challenging is the develop- ment of tools that convert local data resources into published record sets that conform to those key community standards. The development of these publishing systems requires the recognition of a series of socio-technical challenges, including the generation of community buy-in and capacity building, and overcoming issues of scalability and sustainability as data sharing networks continue to grow. In this paper, we describe a tool essential to the publication of biodiversity data: the Global Biodiversity Information Facility (GBIF) Integrated Publishing Toolkit (IPT, http://www.gbif.org/ ipt/), a Java-based software package that provides the biodiversity community with a simple means to perform many necessary functions to publish biodiversity datasets on the web. The IPT is built upon lessons learned from previous data publishing methods, such as Distributed Generic Information Retrieval (DiGIR, http:// digir.sourceforge.net/), the Biological Collection access Service for Europe (BioCASE, http://biocase.org/products/protocols/), and the Taxonomic Databases Working Group (TDWG) Access Protocol for Information Retrieval (TAPIR, http://www.tdwg. org/dav/subgroups/tapir/1.0/docs/tdwg_tapir_specification_2010- 05-05.htm). We define the IPT, discuss the factors that led to its development and growth, and explain how it is being used and PLOS ONE | www.plosone.org 1 August 2014 | Volume 9 | Issue 8 | e102623