Decision tree and ﬁrst-principles model-based approach for reactor runaway analysis and forecasting Tamas Varga Ã , Ferenc Szeifert, Janos Abonyi Department of Process Engineering, University of Pannonia, P.O. Box 158, H-8201 Veszprem, Hungary article info Article history: Received 26 July 2007 Received in revised form 31 October 2008 Accepted 6 November 2008 Available online 3 January 2009 Keywords: Operating regions Decision tree Operating strategies Reactor runaway Heterocatalytic tube reactor abstract Decision trees (DTs) are effective in extracting linguistically interpretable models from data. This paper shows that DTs can also be used to extract information from process models, e.g. they can be used to represent homogenous operating regions of complex process. To illustrate the usefulness of this novel approach a detailed case study is shown where DTs are used for forecasting the development of runaway in an industrial, ﬁxed bed, tube reactor. Based on ﬁrst-principles knowledge and historical process data the steady-state simulator of the tube reactor has been identiﬁed and validated. The runaway criterion based on Ljapunov’s indirect stability analysis has been applied to generate a data base used for DT induction. Finally, the logical rules extracted from the DTs are used in an operator support system (OSS), since they are proven to be useful to describe the safe operating regions. A simulation study based on the dynamical model of the process is also presented. The results conﬁrm that by the synergistic combination of a DT based on expert system and the dynamic simulator a powerful tool for runaway forecasting and analysis is achieved and it can be used to work safe operating strategies out. & 2008 Elsevier Ltd. All rights reserved. 1. Introduction Usually information for solving engineering problems, like identiﬁcation, optimization, process monitoring can be extracted from different sources:  mechanistic knowledge obtained from ﬁrst-principles (physics and chemistry),  empirical or expert knowledge, expressed as linguistic rules,  measurement data, obtained during normal operation or from an experimental process. Different modeling paradigms should be used for an efﬁcient utilization of these different sources of information (Kavli and Lines, 1996; Dourado, 2000, 2008). According to the type of information that is available, three basic levels of model synthesis approaches can be deﬁned.  Integral–differential paradigm or ﬁrst-principle modeling: A complete mechanistic model is constructed from a priori knowledge and physical insight (Cott et al., 1989). Here, the system observations are used only for model validation.  Data paradigm or empirical modeling: No physical (a priori) knowledge is used to construct the empirical model (Ljung, 1987).  Linguistic paradigm: E.g. fuzzy logic-based heuristic and other qualitative modeling. Linguistically interpretable rule-based model is formed based on the available expert knowledge (Klir and Yuan, 1995). This means, if we have good mechanistic knowledge about the process, this can be transformed into integral–differential para- digm described by analytical (differential) equations. If we have information like human experience has described by linguistic rules and variables, the mechanistic modeling approach is useless and the application of rule-based approaches, like fuzzy logic, is more appropriate (Mamdani et al., 1992; Mendel, 1995). Finally, there may be some situations, where the most valuable informa- tion comes from input–output data collected during the opera- tion. In this case, the application of data paradigms is the best choice. These data models are especially valuable, when an accurate model of the process dynamics is needed. Therefore, the nonlinear black box modeling is a challenging and promising research ﬁeld (Bhat and McAvoy, 1990; Bhat et al., 1990; Hernandez and Arkun, 1993; Ydstie, 1990; Sjo ¨berg et al., 1995). Unfortunately, the real situation is not clearly one of the previously mentioned approaches. This means the modeler has only a small amount and different type of information to build the model. Therefore, in order to be able to employ as much knowledge as possible, there is a need for the grey box (Tulleken, ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Artiﬁcial Intelligence 0952-1976/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2008.11.001 Ã Corresponding author. Tel.: +36 88 424 447. E-mail addresses: vargat@fmt.uni-pannon.hu (T. Varga), szeifert@fmt.uni-pannon.hu (F. Szeifert), abonyij@fmt.uni-pannon.hu (J. Abonyi). URL: http://www.fmt.uni-pannon.hu/softcomp Engineering Applications of Artiﬁcial Intelligence 22 (2009) 569–578