A Constraint Language for Process Model Induction Matt Bravo mbravo@stanford.edu Will Bridewell willb@csli.stanford.edu Computational Learning Laboratory, CSLI, Stanford University, Stanford, CA 94305 USA Ljupˇ co Todorovski ljupco.todorovski@fu.uni-lj.si Computational Learning Laboratory, CSLI, Stanford University, Stanford, CA 94305 USA University of Ljubljana, Faculty of Administration, Gosarjeva 5, SI-1000 Ljubljana, Slovenia Abstract We define the inductive process model- ing task as the automated construction of quantitative process models from time series and background knowledge. In this task, the background knowledge comprises generic processes that along with a given set of en- tities define the space of candidate model structures. Typically this space grows expo- nentially with the size of the library, so past research introduced a hierarchical organiza- tion on the processes to constrain that space to a limited set of plausible configurations. However, organizing the processes into a hi- erarchy takes considerable effort, leads to im- plicit constraints, and creates a complex rela- tionship between the knowledge of what pro- cesses exist and the knowledge of how one can combine them. To address these problems, we developed SC-IPM 1 , an inductive process modeler that uses declarative constraints to reduce the size of the model structure space. In this paper, we describe the constraint for- malism and how it guides SC-IPM’s search. 1. Introduction Scientists build models to explain observations of com- plex, dynamical systems. This task requires either an implicit or explicit search through the space of plausi- ble models (Langley et al., 1987). However, the diffi- culty of model development and parameter estimation encourages a greedy search and the scientists may con- sider only a few alternatives before selecting a final structure. We seek to build tools that help scientists 1 Pronounced “Skip ’em”, SC-IPM is an acronym for “Satisfying Constraints to Induce Process Models” systematically create and evaluate alternative models so that they can improve their understanding of the studied phenomenon. To accomplish this goal, we re- quire a representation for domain knowledge that lets experts interactively guide the system’s search. This paper extends previous work on inductive pro- cess modeling (Langley et al., 2002; Langley et al., 2003; Todorovski et al., 2005), by adding a formalism for stating explicit, structural constraints. The gen- eral problem of inductive process modeling takes as input a set of time series for observed variables, back- ground knowledge in the form of generic processes and entities, and a set of instantiated entities whose prop- erties may be associated with the data. As output, a learning system should produce a quantitative process model that explains the data in terms of the back- ground knowledge. A basic approach to this task in- volves instantiating the generic process with the given entities as allowed, exhaustively combining the instan- tiated components into model structures, fitting the numeric parameters, and returning those models with the best quantitative fits to the data. Typically, this strategy will lead to an search space that is exponen- tial in the number of instantiated processes and that includes a several implausible structures. To address these problems, one can introduce con- straints on the model structures. Previously, Todor- ovski et al. (2005) developed a formalism based on the decomposition of generic processes and the speci- fication of a process hierarchy. Our recent experience indicates that this structure places a considerable bur- den on the domain expert, leads to implicit and inter- twined constraints, and creates a complex relationship between the knowledge of what processes exist and that of how to combine them. In this paper we describe a knowledge representation that addresses the shortcomings of the process hierar-