DATA MINING FOR SYSTEM IDENTIFICATION SUPPORT Sandro Saitta, Benny Raphael, Ian F.C. Smith Informatique et Mécanique Appliquées à la Construction (IMAC), Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne, Switzerland {sandro.saitta, benny.raphael, ian.smith}@epfl.ch Abstract: A system identification methodology that makes use of data mining techniques to improve the reliability of identification is presented in this paper. An important aspect of this methodology is the generation of a population of candidate models. An indication of the reliability of system identification can be obtained through an examination of characteristics of the population. This paper presents data mining techniques that provide support for this examination. 1. INTRODUCTION The goal of system identification [4] is to determine the state of a system and values of system parameters through comparisons of predicted and observed responses. A correct understanding of the models output by such techniques is an important aspect. Challenges associated with system identification are that many model predictions might match observations and the best matching model may not be the correct model. For the purposes of this paper, the reliability of identification is defined as the probability that the candidate model(s) obtained through system identification corresponds to reality. Reliability is poor when many models predict the same response at measured locations. Factors that affect the reliability of system identification have been studied in previous research [7]. The present work uses machine learning and data mining techniques [8] for an estimation of the reliability of identification. Three techniques are used and they are correlations [2], principal components analysis (PCA) [3] and decision trees [1].