827 S System Theory: From Classical State Space to Variable Selection and Model Identiﬁcation Diego Liberati Italian National Research Council, Italy Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited. INTRODUCTION System Theory is a powerful paradigm to deal with abstract models of real processes in such a way to be accurate enough to capture the salient underly- ing dynamics while keeping the mathematical tools easy enough to be manageable. Its typical approach is to describe reality via a reduced subset of ordinary differential equations (ODE) linking the variables. A classical application is the circuits theory, linking the intensive (voltage) and extensive (current) variables across and through each simpliﬁed element by means of equilibrium laws at nodes and around elementary circuits. When such relationships are linear (like in ideal capacitors, resistances, and inductors, just to stay in the circuit ﬁeld), a full battery of theorems does help in understanding the general properties of the ODE system. Positive systems, quite often used in compartmental processes like reservoirs in nature and pharmacologic concentration in medical therapy, enjoy most of the properties of the linear systems, with the nonlinear constraint of non negativity. More general nonlinear systems are less easily treatable unless a simple form of nonlinearity is taken into account like the ideal characteristic of a diode in circuit theory. When the physics of the process is quite known, like in the mentioned examples, it is quite easy to identify a small number of variables whose set would fully describe the dynamics of the process, once their interrelations are properly modeled: this is the classical way to approach such a problem. Nowadays, on the other side, new ﬁelds are grow- ing up, like bioinformatics, where, instead, many data are collected over several possibly correlated variables whose joint dynamics would follow a law not a priori known nor easily understandable on the basis of the state-of-the-art knowledge. Given the opportunity to have so much data not easy to correlate by the human reader, but probably hiding interesting properties, one of the typical goals one has in mind is to face the problem on the basis of a hopefully reduced meaning- ful subset of the measured variables. The complexity of the problem makes it thus worthwhile to resort to automatic classiﬁcation procedures in order to pre- process the collected data. Then, the original question does arise of reconstructing the synthetic mathematical model, capturing the most important relations between variables, in order to infer their hidden relationships, like in systems biology. BACKGROUND The introduced tasks of selecting salient variables and identifying their relationships from data may be sequentially accomplished with various degrees of suc- cess in a variety of ways. Principal components order the variables from the most salient to the least one, but only under a linear framework. Partial least squares do allow extension to nonlinear models, provided that one has prior information on the structure of the involved nonlinearity; in fact, the regression equation needs to be written before identifying its parameters. Clustering may operate even in an unsupervised way without the a priori correct classiﬁcation of a training set (Boley, 1998). Neural networks are known to learn the embedded rules with the indirect possibility (Taha & Ghosh, 1999) to make rules explicit or to underline the salient variables. Decision trees (Quinlan, 1994) are a popular framework providing a satisfactory answer to the recalled needs. Four main general purpose approaches will be brieﬂy discussed in the present article. In order to reduce the dimensionality of the problem, thus simplifying both the computation and the subsequent understanding of the solution, the critical problems of selecting the most salient variables must be solved. This step may already be sensitive, pointing to the very core of the information to look at. A very simple approach is to resort to cascading a divisive partitioning of data or- thogonal to the principal directions—PDDP—(Boley, 1998) already proven to be successful in the context of