Local Big Data: The Role of Libraries in Building Community Data Infrastructures John Carlo Bertot University of Maryland College Park College of Information Studies College Park, MD 20742 301.405.3267 jbertot@umd.edu Brian S. Butler University of Maryland College Park College of Information Studies College Park, MD 20742 301.405.3267 bsbutler@umd.edu Diane M. Travis University of Maryland College Park College of Information Studies College Park, MD 20742 301.405.3267 dmtravis@umd.edu ABSTRACT Communities face opportunities and challenges in many areas, including education, health and wellness, workforce and economic development, housing, and the environment [21]. At the same time, governments have significant fiscal constraints on their ability to address these challenges and opportunities. Through a combination of open government, open data, and civic engagement, however, governments, citizens, civil society groups, and others are reinventing the relationship between governments and the governed by developing crowdsourced and other innovative solutions for community advancement. Underlying this reinvention and innovation is data – particularly local data about housing, air quality, graduation rates, literacy rates, poverty, disease, and more. And yet, not all communities have the capacity to create, work with, or leverage data at the local level. Using a case study approach in a medium- sized U.S. city, this paper focuses on the issues that smaller communities face when seeking to create local data infrastructures and the extent to which libraries can develop their capabilities, capacity, and abilities to work with community information and data to facilitate community engagement and high-impact, locally relevant analytics. General Terms Data management, communities, libraries. Keywords Big Data, Community engagement, Data infrastructure, Data curation. . 1. INTRODUCTION Communities face opportunities and challenges in many areas, including education, health and wellness, workforce and economic development, housing, and the environment (Seattle Foundation, 2006). At the same time, governments have fiscal constraints that limit their ability to directly address these challenges and opportunities. Through a combination of open government, open data, and civic engagement, however, governments, citizens, civil society groups, and others are reinventing the relationship between governments and the governed by developing civic crowdsourcing initiatives and other innovative solutions for community advancement. Underlying this reinvention and innovation is data – particularly local data about housing, employment, air quality, graduation rates, literacy rates, poverty, business activity, and disease. Data have existed in many key domain areas for some time, often in the form of large-scale national datasets, such as those created by the U.S. Census Bureau, Bureau of Labor Statistics, Environmental Protection Agency, and the Centers for Disease Control. All of the data from these agencies have varying levels of local granularity, and often have more localized (e.g., block, neighborhood, city, county, region) components. Emerging data integration capabilities and analytic techniques, however, enable novel ways of viewing and analyzing data. This, in turn, has supported new strategies for informing policy-makers, decision-makers, stakeholders, and citizens about their communities. Often referred to as Big Data, the ability to harness geo-spatial data, chronic disease data, literacy data, and others to create data visualizations, interactive map- based analysis, and more can often shed light on critical community needs, gaps, and solutions [1, 5, 7]. But in order to engage in these data science efforts; create analytic tools; and foster civic engagement, there are underlying infrastructure needs which must be met. Critical elements of community data infrastructures include, but are not limited to [8, 10, 14]: • Central data repositories, where data are stored, maintained, and catalogued; • Data standards, to which collected data adhere; • Data communities, which will collect, maintain, and curate data; Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. dg.o '14, June 18 - 21 2014, Aguascalientes, Mexico Copyright 2014 ACM 978-1-4503- 2901-9/14/06…$15.00. http://dx.doi.org/10.1145/2612733.2612762