An Overview of Process Query Systems George Cybenko Vincent Berk Thayer School of Engineering Dartmouth College Hanover, NH 03755 USA Email: george.cybenko@dartmouth.edu vincent.berk@dartmouth.edu Valentino Crespi Department of Computer Science California State University Los Angeles Los Angeles, CA 90032 Email: vcrespi@calstatela.edu Robert S. Gray Guofei Jiang Thayer School of Engineering Dartmouth College Hanover, NH 03755 USA Email: robert.s.gray@dartmouth.edu guofei.jiang@dartmouth.edu Abstract— Process Query Systems (PQS) are a new kind of information retrieval technology in which user queries are expressed as process descriptions. The goal of a PQS is to detect the processes using a datastream or database of events that are correlated with the processes’ states. This is in contrast with most traditional database query processing, information retrieval systems and web search engines in which user queries are typically formulated as Boolean expressions. In this paper, we outline the main features of Process Query Systems and the technical challenges that process detection entails. Furthermore, we describe several importance application areas that can benefit from PQS technology. Our working prototype of a PQS, called TRAFEN (for TRAcking and Fusion ENgine) is described as well 1 . I. I NTRODUCTION Many applications of current interest involve using databases or datastreams of events to detect instances of processes. In those applications, events provide evidence that is used to infer the existence and estimate the states of the various processes of interest. Examples of such applications include: network and computer security; network management; sensor network tracking; military situational awareness and; critical infrastructure monitoring and protection. (Specific details of how the techniques discussed in this paper apply to these application areas are in Appendix I.) While these and other applications are superficially different from one another, they in fact share many common fea- tures when viewed from an appropriately abstract perspective. This abstract framework posits that a collection of processes, {M 1 , M 2 , ...}, is producing an interleaved stream of observ- able events: ..., e i ,e i+1 ,e i+2 , ... where event e j occurs at time t j where t j t j+1 . The goal in many applications is to solve the inverse problem, namely determining which processes produced which events in the observed event stream. A Process Query System (PQS) is a software system that strives to solve this inverse problem. In this paper, we adopt the thinking of modern systems and control theory (including such areas as communications, speech recognition and other areas that use Hidden Markov Models [28], for example) in which processes have “internal” 1 This paper was published in the Proceedings of SPIE Defense and Security Symposium. 12-16 April 2004, Orlando, Florida. or “hidden” states that are not always externally observable. The processes’ hidden states generate observable events from which we seek to infer the existence of the processes and estimate the hidden states of the instantiated processes as observable events are collected. In this framework, we believe that many existing formula- tions of the above applications either implicitly or explicitly seek to solve what we call the Discrete Source Separation Problem (DSSP). The DSSP is informally stated as follows: The Discrete Source Separation Problem (DSSP) - Given a finite sequence of observed events, e t1 ,e t2 , ..., e tn and a collection of processes, {M 1 , M 2 , ...}, deter- mine: 1) The “best” assignment of events to process in- stances, namely f : {1, 2, ..., n}→N + ×N + , where f (i)=(j, k) is interpreted as meaning that event e i was caused by the kth instance of process model j (the process detection problem); 2) The corresponding internal states and state se- quences of the processes thus detected (the state estimation problem). Here N + =1, 2, ... is the set of positive integers. (The name, Discrete Source Separation Problem, is an intentional play on the Blind Source Separation Problem that arises in continuous signal processing [19].) Many details are intentionally missing from this simple statement of the DSSP. For example, what constitutes the “best” assignment is ultimately determined by an application specific scoring or objective function between sequences of events and process models and their instances. Moreover, the causal relationships between processes and events, as well as the interdependencies between processes themselves need to be made more explicit. An overarching question arising in the DSSP is the matter of how processes are described and how the process models are created in the first place. A key ingredient of the DSSP is that there is no one-to- one association of events to states or models. Such a one-