A Temporal Search Engine for a Massive Multi-Parameter Clinical Information Database LH Lehman, TH Kyaw, GD Clifford, RG Mark Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA, USA Abstract We describe a novel search engine that is capable of rapid execution of queries concerning changes in the gra- dients and absolute (and relative) values of multiple ir- regularly sampled and asynchronous physiological pa- rameters over many time scales. The search engine en- ables search criteria for multiple physiological parame- ters using gradient bounds, rates of change, and threshold breeches over various time scales. Multiple signals can be searched and combined in a Boolean manner to form com- plex queries. Pre-computed ranges and multi-scale gra- dients are used to significantly reduce the search time for locating temporal events. We have implemented the search engine in MATLAB and tested the algorithm on a massive multi-parameter intensive care unit database (MIMIC II). To illustrate the use of our search approach, a set of numer- ical search criteria were developed by clinicians to locate evidence for important pathophysiological conditions. . 1. Introduction The Multi-parameter Intelligent Monitoring for Inten- sive Care II database (MIMIC II DB) [1] is a massive and growing intensive care unit (ICU) archive collection of over 17,000 patient records. One important challenge in clinical research using MIMIC II is in identifying clin- ical events of interest and cohorts of patients with similar pathologies. In particular, one main challenge is in the de- tection of clinical events that may involve complex dynam- ics of multiple physiological parameters over multiple time scales. Traditional threshold-based searching algorithms are incapable of detecting the complex physiological dy- namics. We describe a search engine that is able to perform multi-parameter, temporal queries on a large-scale time- series medical database, such as MIMIC II. The search engine is designed to serve as a filtering and event detec- tion tool for researchers interested in investigating patient records that meet specific pathophysiological criteria, and in identifying possible onset times of certain clinically sig- nificant events. Some example temporal queries that clini- cians might like to perform are as follows. • Find episodes of lactic acidosis, in which pH ≤ 7.2, paCO2 ≤ 35 mmHg, and lactate ≥ 2.5 mmol/L. • Find evidence of acute kidney injury, where creatinine ≥ 1.5 mg/dL and increases by 50% within 48 hours. • Find episodes of hemodynamic instability in which heart rate (HR) increases by 50% or more in a six hour time win- dow, with a simultaneous drop in arterial blood pressure (ABP) by at least 20% to below 60 mmHg. Temporal queries on time series data cannot be effi- ciently implemented in a traditional SQL-based relational database. Temporal query languages, such as TQuel [2] and TSQL2 [3], express time intervals with cumbersome syntax and have limited support for multi-parameter time series data with different temporal resolutions. Saeed et al. [4] used a selected set of precomputed wavelet coefficients for efficient temporal searches. In contrast, our approach is to use pre-computed gradient bounds to reduce search time and to employ simple algorithms with little storage overhead for time series searches. In the rest of the paper, we first characterize the time se- ries data in the MIMIC II DB. Next, we give an overview of the search engine design, and describe a set of example searches. Finally, we demonstrate the utility of the search engine through two simple, illustrative examples of clini- cal analysis using search engine. 2. Temporal searches on the MIMIC II database The search engine [5], implemented in MATLAB, cur- rently searches on asynchronously and irregularly sam- pled MIMIC II lab results, medications, nurse-verified data downloaded from the bedside monitors, and demo- graphic information. We have selected 128 parameters from MIMIC II and imported the data into the search en- gine environment. The parameter list includes important physiological indicators such as HR, ABP, temperature, cardiac output, laboratory values, IV drug levels, micro- biology results, etc. These 128 clinical data items for approximately 17,000 patients together with their demo- ISSN 0276-6574 637 Computers in Cardiology 2007;34:637-640.