A Knowledge-Based Framework for Deploying Surveillance Problem Solvers David L. Buckeridge ∗, † , Martin J. O’Connor † , Haobo Xu † and Mark A. Musen † † Stanford Medical Informatics, Stanford University School of Medicine Stanford, CA 94305-5479, {dlb, moconnor, hxu, musen} @smi.stanford.edu ∗ VA Palo Alto Health Care System, Palo Alto, CA 94304 Abstract— Increased concern about bioterrorism and emerging diseases is driving the development of systems for early epidemic detection. These systems have different requirements than tradi- tional public health monitoring systems. They typically deal with larger and more diverse data sets and must use a large variety of analysis techniques to perform real-time tracking of multiple syndromes. Operationally, these systems must be able to perform rapid analysis on large data sets. A highly configurable system is also required to enable dynamic adaptation of outbreak detection algorithms to accommodate changing data streams and disease models. To meet the needs of these systems, we have developed a knowledge-based framework for deploying surveillance problem solvers. We show how we are using this architecture in a surveillance system that uses a variety of problem solvers to perform outbreak detection. I. I NTRODUCTION Surveillance systems designed to detect a bioterror attack or to monitor for an emerging disease have different operational requirements than existing public health monitoring systems [1], [2]. Instead of looking for strong specific signals of one well-characterized disease in a single data source, these sys- tems must track both specific and non-specific signals in many data sources. Rapid analysis is crucial so such systems must efficiently process large amounts of data, often at different temporal and spatial granularities. Different data structures and surveillance goals require different analytic approaches, so systems must allow a hybrid arrangement of problem solvers, with statistical and knowledge-based problem solvers cooperating to solve problems. In addition, domain knowledge should guide real-time selection of appropriate data-specific analysis techniques for outbreak detection, so a system must support flexible, dynamic reconfiguration to address changing data streams and disease models. To meet the complex operational and research needs of these surveillance applications, we have developed a knowledge- based framework that provides a run-time architecture for deploying surveillance problem solvers. The twin goals of this system are (1) to provide an efficient run-time environment that supports rapid data analysis; and (2) to provide a modu- lar mediation framework to allow dynamic knowledge-based method selection. In this paper, we first describe the main components of the framework and we then focus on the control structure for deploying and coordinating problem solvers. We then describe the deployed system, and provide an example of the system addressing a surveillance task. II. BACKGROUND A. BioSTORM We have developed a surveillance system called BioSTORM (Biological Spatio-Temporal Outbreak Reasoning Module) which implements our knowledge-based framework. The framework, and BioSTORM, have four principal components: a problem solver library [4] that supports a variety of statistical and knowledge-based techniques; a data broker, which assists in the integration of multiple data sources; a data mapper that tailors data sources to the needs of individual problem solvers [5]; and a control structure. 1) Problem Solvers: Our laboratory has built a variety of knowledge-based problem solvers over the past decade [6]. These problem solvers generally perform per-patient analyses. In contrast, surveillance systems generally require population- level statistical analyses with a spatial dimension. We have developed a library of statistical and spatial problem solvers to compliment our existing knowledge-based and temporal methods [4]. 2) Data Broker: Raw surveillance data are diverse and distributed with little common semantic structure. The data broker enables integration by providing a common semantic structure for raw data. It has an ontology of data attributes and sources, effectively providing a semantically consistent markup of disparate data sources. The data broker uses this ontology to allow problem solvers to read data from a number of sources transparently to construct a stream of uniform data objects that conform to its common semantics. 3) Data Mapper: The data mapper takes the single stream and transforms it into separate streams for delivery to problem solvers. To work with the mapper, each problem solver must publish an input-output ontology describing the structure of the data that it wishes to receive. The mapper uses this ontology to supply each problem solver with a customized set of data objects. It provides a controlled set of possible mapping relations between concepts and attributes of the data source ontology and a problem solver’s input-output ontology. 4) Control Structure: The control structure manages the deployment of problem solvers and data flow between them. It aims to provide a modular framework with dynamic config- uration support to enable knowledge-based deployment and reconfiguration of a variety of analytic methods. The data broker and the mapper are designed to provide customized