An Evolutionary Approach to Knowledge Induction: Genetic Programming in Hydraulic Engineering Vladan Babovic*, Maarten Keijzer*, David Rodríguez Aguilera** and Joe Harrington *** * DHI Water & Environment, Agern Alle 11, DK-2970 Hørsholm, Denmark; PH +45 – 45 16 91 00; FAX +45 – 45 19 92 92; email: vmb@dhi.dk, mak@dhi.dk; http://www.d2k.dk http://www.dhi.dk ** University of Cordoba, Department of Mathematics E.T.S.I.A.M., Avda. Menéndez Pidal s/n,Cordobá, Spain, PH:+ 34 957218519, FAX: :+ 34 957218519; email: ma2roagd@uco.es *** Department of Building and Civil Engineering, Cork Institute of Technology, Rossa Avenue, Cork, Ireland; PH +353 – 21 – 432 63 13; FAX +353 – 21 – 434 52 44; email: jharrington@cit.ie Abstract The process of scientific discovery has long been viewed as the pinnacle of creative thought. Thus, to many people, including some scientists themselves is seems unlikely candidate for automation by computer. However, over the past two decades researchers in AI have repeatedly questioned this attitude. The paper describes a specific evolutionary algorithm technique — genetic programming — within a scientific discovery framework, as well as its application on real world data. Introduction Suppose that we are given the task to model an unknown or poorly understood system. In such situations a logical starting point is the design of measurement campaigns and the collection of data. One usually measures forcing variables (the ones that are outside the system) and simultaneously the response of the system in view of the change of the state of the system (state- or internal variables), and the change in corresponding output of the system (resulting functions). After enough data of sufficient quality are collected, one can attempt to identify the system. Then, three possible scenarios can occur (Kompare, 1995): 1. Nothing useful can be concluded from the observations. This can happen if the measuring campaign was poorly designed, or nor carried out over a sufficiently long period of time, or if relationships among variables simply do not exist. More measurements, or redesigned more elaborate observations are needed to improve the situation. 2. Sometimes we may end up with a statistical, black box model. With this category of models we will be able to predict the proper behaviour of the system, although we will not be able to characterise its intrinsic structure and behaviour. In other words, we will be able to say what the model does, but not how. In addition to this, we will not be able to guarantee the behaviour of such model in regions not covered by the data from which the model was constructed. This is due to the fact that the model covers only the relationships found within the given data. 3. In some cases we may be able to recognise patterns within the data and form from these patterns inference about basic processes in the observed system. After repeated measurements we should be able to develop a conceptual (mechanistic) model. Such a model is a so-called white box, or transparent model and we should be able to say what and how model does. Due to the conceptual background of the model, we are much more certain that the model will represent reality. This also helps when using the data out of the range in which model was constructed.