Paper 169-27 The Web Data Entry System: Methods for Web Development and SAS ® Data Management Paul A Thompson, Sarah Littlewood, Avril J Adelman and J Philip Miller Division of Biostatistics, Washington University, St. Louis, MO ABSTRACT Processing data using the SAS/IntrNet ® system requires working with SAS ® , HTML, and JavaScript code simultaneously. The Web Data Entry System (WDES) is a macro-based system which develops web page systems and manages the web-oriented data handling process. The system simplifies and speeds development by using SAS macros to very quickly process a modified HTML file, producing a processed HTML file, the SAS dataset, and the code to process the interaction between the two. The system simplifies and speeds management of transactions using a comprehensive system of SAS macros. The WDES uses a metadata strategy, processes the final HTML file and uses the SAS resolve function to simplify modification of files. This approach is currently supporting two multi-site clinical trials in which the Division of Biostatistics, Washington University, serves as the data center. INTRODUCTION The internet offers unprecedented opportunity for remote interaction with users. In the Division of Biostatistics, Washington University, we are managing data from remote clinics in several multi-site clinical trials (including the CRISP project studying polycystic kidney disease and the EXCITE trial studying stroke rehabilitation techniques). In these projects, we must provide users at multiple remote sites with the full data editing experience, including methods which allow users to add new observations to datasets, edit existing observations, rarely delete observations, and list values from the dataset. In return, we must provide feedback and information to the clinics (i.e., recruitment status, case information, interim results). In our environment, dynamic (rather than static stored) HTML pages are generally used, for several reasons. First, maintaining static pages is difficult when multiple clinics are used, which each require a separate page. Second, the pages must allow collection of data (using a form version of the screen) and printing of the data (using a printable version of the screen). These are accommodated most directly using dynamic pages. Finally, the system needs to work in an environment where aspects are changing frequently. Use of a dedicated HTML editor, such as FrontPage ® , is not consistent with these needs for dynamic page construction. Development and code management is difficult in an environment in which frequent modifications are made. The SAS/IntrNet approach works well in this situation and with these requirements. To satisfy these basic needs, the Web Data Entry System (WDES) has been developed. It performs two functions: 1. It is a development system to produce HTML forms, automating SAS code production and final HTML page preparation simultaneously. 2. It is a macro-oriented, metadata-based method for web-based data management. OTHER METHODS When setting up methods to work with SAS/IntrNet to gather data on the web using HTML-based forms in complex and multi-form projects, several approaches are often tried initially: 1. The PUT statement can be used to write HTML screens in the DATA step. 2. The HTML pages may be kept in files, and modified using scan, substr and other character functions. Using these different approaches, it eventually becomes difficult to maintain a multiple-form system. Here are some problems commonly encountered: 1. Frequent HTML form changes are very difficult to manage. 2. PUT statements involve single and double quotes to properly write statements. These invariably are difficult to maintain, result in unclosed quotes and are difficult to read. 3. Once the initial author quits, maintaining code is difficult. These problems call for a radically different approach, which systematically solves the problems of maintainability, development and system management of forms. These solutions are provided by the WDES. KEY IDEAS FOR THE WDES The basic ideas for the WDES are as follows (some points are not unique to WDES): • Web to SAS to web: An HTML screen or page is used to get data from a user and send it to SAS, which performs a task and returns feedback to the user in another HTML screen. • The SCAD: The gestalt of code (SAS, HTML and JavaScript) and data used to support the processing of a single form is called the SCAD (Screen, Controls, And Dataset). It includes these components: 1) Screen: an HTML-based form presented on the web. 2) Controls: the SAS code which moves data between the web form and the SAS dataset. 3) Dataset: a SAS dataset (two-level name, variables and their characteristics). • Macros for common processes: The very common standardized data management processes (moving data from SAS dataset to web form and vice versa, printing data, checking for observations) are performed using macro functions. This ensures that code is easily maintained. Macro functions for data processing enable systematic and common modifications in all programs by macro changes. • Metadata for projects and SCADs: Metadata is “data about data.” A metadata system is used to provide the macro functions with various types of information. The two type of metadata, maintained using a web interface, are: 1) Project-wide: A project is a large data collection effort, generally involving multiple forms and data collection instruments, directed to fulfilling some joint mission. Projects involve common directories for SAS datasets, format libraries and stored compiled macro libraries. They often involve shared lists of subjects and participants, and thus need a common location for such information. 2) SCAD-specific: Within the project, each SCAD has unique metadata for dataset information, variable information and form information. The SCAD-level metadata is similar to the output datasets produced by PROC CONTENTS, but is more detailed and has several other types of information. • Common representation method: A standardized method is used to name macro variables. This ensures that the system users are always clear on macro names. • Modular HTML files: The final modified HTML file is divided into “pagelets” which are sections of the HTML file that function in a similar fashion. Thus, the header part, BY variable portion, main variable section and decision choice portion of the HTML file are placed in different HTML pagelet files. Thus, multiple SCADs can use shared components. SUGI 27 Emerging Technologies