The DEEDS Platform: Structured Data Representation and Statistical Modeling Support Andres Bejarano †1 , Robert Flynn 2 , Tyler Hoskins 2 , Michael Iacchetta 2 , Steven Clark 1 , Guneshi Wickramaarachchi 1 , Sumudinie Fernando 1 , Parameswaran Desigavinayagam 1 , Chandima HewaNadungodage 1 , Ann Christine Catlin 1 1 ITaP Research Computing Purdue University 155 South Grant St West Lafayette, Indiana 47907, USA 2 Forestry and Natural Resources Purdue University 715 W. State Street West Lafayette, Indiana 47907, USA abejara@purdue.edu†, wflynn@purdue.edu, tdhoskin@purdue.edu, miacchetta@usgs.gov, clarks@purdue.edu, gwickram@purdue.edu, swfernan@purdue.edu, pdesigav@purdue.edu, chewanad@purdue.edu, acc@purdue.edu Abstract— The Digital Environment for Enabling Data-driven Science (DEEDS) project is a partnership between domain scientists and computer scientists to create a platform that offers end-to-end support for diverse scientific workflows. DEEDS provides services for organizing research activities, building file repositories, representing structured data, defining and connecting to computing tools, and analyzing results—integrated into a single, powerful dashboard. This paper focuses on an environmental science research project that relies on DEEDS to preserve, share, and analyze large volumes of interrelated measurements collected over time. The DEEDS DataTables component manages their complex data model as interactive multi-dimensional, hierarchical tables, with metadata for annotation, validation and customized viewing. The Tools component manages upload and configuration of user modeling tools and supports their launch and workflow tracking for results traceability. A discussion of DEEDS functionality is followed by a description of how this research group transitioned their scientific investigation to the DEEDS platform. Keywords— research life-cycle support, environmental science, data modeling, statistical analysis I. INTRODUCTION Scientific investigations are complex processes that require research groups to make decisions about the methods they will use to preserve their data, build software for analysis, connect data to analysis tools, and share data and results. In most cases, these decisions are made in an ad hoc way, so that researchers responsible for different areas of the project (collecting data, writing code, analyzing results) operate in different environments. Project data, code, analysis, and results are thus fragmented, which complicates preservation, sharing, interoperability, results traceability, and reuse. The DEEDS project recognized the need for an end-to-end solution, where the platform and its interactive interfaces provide the essential services required by researchers for representing and managing their collected data, defining metadata, exploring data collections, running analyses with selected data, tracking user workflows for reproducibility of results, and sharing all elements of the investigative process. Requirements and use of these essential services differ widely among research projects, even within the same science domain. However, DEEDS merged requirements for data management, computing services, and user interfaces into a single platform through a collaborative effort that engaged researchers in the fields of chemistry, nutrition science, environmental science, agriculture, electrical engineering, and civil engineering [1]. Research groups create shared datasets on the DEEDS platform for their projects. DEEDS provides a dataset dashboard that controls data flow, metadata, and operations through a sequence of tabs that manage dataset Cases (organization of research activities), Files (repository management), DataTables (structured data management), Tools (computing services and workflows) and Analytics (built-in ad hoc data analysis) [2]. Members of the Strategic Environmental Research and Development Program “EcoTox” project [3] are creating team- shared datasets to support their research investigation. Their dataset cases represent experimental units or aquaria defined by study properties such as animal species, chemical, and concentration. The fundamental requirement for this research groups is the systematic, reliable preservation, validation, and analysis of large volumes of collected measurements and observations. Data are modeled as hierarchical, multi- dimensional data tables to represent repeated measures for cases with multiple measurement types and multiple phases over time. The key components supporting their activities are DataTables and Tools. A brief description of these two components is given first. We then describe how EcoTox researchers transitioned their scientific workflow to DEEDS. II. DATATABLES COMPONENT Data collected during research investigations can have very complex relationships. Researchers often collect and record these measurements as sheets in Excel workbooks. When researchers explore or analyze their collected data, manually keeping track of relationships or implementing a database schema to appropriately represent the data model is a cumbersome task. The DEEDS DataTables component supports structured data representation and management, and the platform provides easy-to-use interfaces for defining complex data models by simply uploading and linking collections of spreadsheets. A DataTable is a storage unit that defines,