CCDST: A free Canadian climate data scraping tool Charmaine Bonifacio a , Thomas E. Barchyn b , Chris H. Hugenholtz b,n , Stefan W. Kienzle a,c a Department of Geography, University of Lethbridge, 4401 University Drive, Lethbridge, AB, Canada T1K 3M4 b Department of Geography, University of Calgary, 2500 University Drive NW, Calgary, AB, Canada T2N 1N4 c Applied Behavioural Ecology and Ecosystems Research Unit, University of South Africa, PO Box 392, Florida, Pretoria, South Africa article info Article history: Received 11 December 2013 Received in revised form 16 October 2014 Accepted 18 October 2014 Available online 29 October 2014 Keywords: Climate data online Scraping tool Canada abstract In this paper we present a new software tool that automatically fetches, downloads and consolidates climate data from a Web database where the data are contained on multiple Web pages. The tool is called the Canadian Climate Data Scraping Tool (CCDST) and was developed to enhance access and simplify analysis of climate data from Canada's National Climate Data and Information Archive (NCDIA). The CCDST deconstructs a URL for a particular climate station in the NCDIA and then iteratively modifies the date parameters to download large volumes of data, remove individual file headers, and merge data files into one output file. This automated sequence enhances access to climate data by substantially reducing the time needed to manually download data from multiple Web pages. To this end, we present a case study of the temporal dynamics of blowing snow events that resulted in 3.1 weeks time savings. Without the CCDST, the time involved in manually downloading climate data limits access and restrains researchers and students from exploring climate trends. The tool is coded as a Microsoft Excel macro and is available to researchers and students for free. The main concept and structure of the tool can be modified for other Web databases hosting geophysical data. & Elsevier Ltd. All rights reserved. 1. Introduction Vast archives of climate data are publicly available through the Internet (e.g., Menne et al., 2012; Vincent et al., 2012), however, not all archives can be accessed efficiently. Often, considerable manual downloading is required, which delays analysis and adds considerable cost to projects (Thorne et al., 2011). Ideally, climate data should be easily accessible in a bulk format for rapid as- sessment and analysis. In response to this, significant progress has been made collating and distributing climatic data through various web portals. Efforts such as the Goddard Institute for Space Studies temperature re- cord (Hansen et al., 2010), Global Historical Climate Network (Lawrimore et al., 2011; Menne et al., 2012), Climatic Research Unit temperature database (Jones et al., 2012), or Berkeley Earth (Rohde et al., 2013) provide global records of climate variables. A number of Canada-specific products have also been developed, such as the Adjusted and Homogenized Canadian Climate Data (Vincent and Gullett, 1999; Wan et al., 2010; Vincent et al., 2012), or various spatially interpolated products detailed in McKenney et al. (2011), also Hutchinson et al. (2009). However, for some applications direct access to raw data is preferable. First, direct access reveals all the variables and stations measured. For example, some weather station records contain notes from manual observations, which are invaluable for ana- lyzing phenomena such as dust storms (e.g., Fox et al., 2012) and other weather conditions that cannot be recorded by instruments. In contrast, some portals only serve certain data fields. For ex- ample, the Berkeley Earth project focuses primarily on tempera- ture (Rohde et al., 2013), which restricts the types of analyses possible. Design criteria in large assimilation projects also mean that some stations are omitted (Vincent et al., 2012). Second, raw data are often available at finer timescales. Other portals only serve data on monthly or daily timescales, which are less useful for fine scale analyses such as analysis of extreme storms, which de- pends on hourly data (e.g., Hugenholtz, 2013). Third, data are di- rectly measured, and this is important for local effects that homogenization can mask (Vincent et al., 2012). Although we note that homogenized data are important for some trend analyses (Rohde et al., 2013). Fourth, the data are usually up to date, lim- iting problems with delays until portals update their records. In Canada, direct public access to government-collected his- torical climate data is only available online through the National Climate Data and Information Archive (hereafter NCDIA; http:// climate.weatheroffice.gc.ca). To access data, users select the data interval (hourly, daily, or monthly), the date range, and the station name. The site returns a list of stations meeting the user-defined Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences http://dx.doi.org/10.1016/j.cageo.2014.10.010 0098-3004/& Elsevier Ltd. All rights reserved. n Corresponding author. E-mail address: chhugenh@ucalgary.ca (C.H. Hugenholtz). Computers & Geosciences 75 (2015) 13–16