A Cyber-Infrastructure for a Virtual Observatory and Ecological Informatics System -VOEIS Clemente Izurieta 1 , Sean Cleveland 1 , Ivan Judson 1 , Pol Llovet 1 , Geoffrey Poole 1 , Brian McGlynn 1 , Lucy Marshall 1 , Wyatt Cross 1 , Gwen Jacobs 1 , Barbara Kucera 2 , David White 3 , F. Richard Hauer 4 , Jack Stanford 4 1 Montana State University, Bozeman MT 59717 2 University of Kentucky, Lexington, KY 40506 3 Murray State University, Murray, KY 42071 4 Flathead Lake Biological Station, Division of Biological Sciences, University of Montana, Polson, MT 59860 clemente.izurieta@cs.montana.edu Abstract— The Virtual Observatory and Ecological Informatics System (VOEIS) provides a framework for data acquisition, analysis, model integration, and display of data products from completed workflows including geospatially explicit models, graphs from statistical analyses, and GIS displays of classified ecological attributes on the landscape. VOEIS is intended to complement the capabilities of the Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI) Hydrologic Information System (HIS) by providing sound data and metadata management capabilities for field observations and analytical lab actions. Functionality provided by VOEIS is supported by a Field Data Model (FDM) that enhances the limited geospatial capabilities of CUAHSI’s Observations Data Model (ODM). Access to VOEIS data and metadata is also made accessible via programmatic APIs which facilitates integration with other service oriented “e-Science” architectures and distributed frameworks. Keywords—framework; cyber infrastructure; data and meta-data management I. INTRODUCTION CUAHSI’s Hydrological Information System (HIS) is an internet-based system that supports the distribution of hydrologic data. CUAHSI’s HIS “is comprised of hydrologic databases and servers connected through web services as well as software for data publication, discovery and access.” [1,23] Though HIS provides exceptional server side support, data entry and quality control client tools, HIS presumes that individual research labs posses sound internal data management practices, doesn't provide tools for managing metadata about field and analytical lab actions, and has a limited data model for geospatial reference. CUAHSI’s Observations Data Model (ODM) [6] is founded upon an information model for observations at stationary points. This model is insufficient to characterize complex spatio-temporal relationships that arise under circumstances where hierarchical and dynamic sampling locations occur. VOEIS is an integrated sensor and ecological informatics system that complements CUAHSI’s HIS capabilities by supporting all-encompassing workflows; from the collection of streaming senor data to the application of those data in simulation models and visualizations. VOEIS facilitates the management of data and science metadata within individual research labs, solves the problem of the static geospatial data model, and interfaces with HIS to allow labs to share some or all data via the HIS protocols. The VOEIS infrastructure is designed to extend the functionality and knowledge representation capabilities of CUAHSI HIS by providing necessary interfaces, software components, and a complementary Field Data Model (FDM) schema [18] that captures data processed in the lab or collected by scientists in the field. VOEIS has three basic research elements: 1) the development and deployment of sensor networks which requires the cyber-infrastructure enhancement of hardware at two field hubs (FLBS and HBS described in section III B); 2) the development and deployment of an informatics system to manage and serve hydrological and meteorological data and metadata, and to interface with CUAHSI’s HIS and ODM; and 3) the development and usage of protocols and APIs to interface with partnering technologies (i.e., WaterML [27]). II. BACKGROUND The challenges of managing scientific data are significant, and over the years they have typically fallen in the hands of investigators. There exist significant obstacles in workflows supported by cyber-infrastructures; from operation and field deployment of sensors – to data streams – to data management – to data analysis – to the use of integration tools. These multifaceted obstacles involve hardware, middleware and software. However, significant work and progress has been made to tackle the challenges of managing these workflows, discovering data, storing data, and publishing scientific data in architectures that are conducive to ease-of-use, dissemination, documentation and research for scientists. PIs, researchers, managers, and scientists alike need the ability to easily access (and possibly integrate) information that is housed in distinct geographical and distributed sites. Additionally, such information is very likely to be stored in different formats and disseminated using a diverse range of communication protocols. The Tupelo middleware [4] developed at the National Center for Supercomputing (NCSA) and the This work is licensed under a Creative Commons Attribution 3.0 Unported License (see http://creativecommons.org/licenses/by/3.0).