F F S S P P M M 0 0 4 4 ORAL PRESENTATIONS - SESSION 2 4th International Workshop on Functional-Structural Plant Models, 7-11 june 2004 –Montpellier, France Edited by C. Godin et al., pp. 61-64 Analysis of the plant architecture via tree-structured statistical models: the hidden Markov trees J.-B. Durand*, Y. Guédon*, Y. Caraglio*, E. Costes** *Unité Mixte de Recherche CIRAD/INRA/CNRS/Université de Montpellier II, Botanique et Bioinformatique de l'Architecture des Plantes TA40/PS2, 34398 Montpellier cedex 5, France ** Unité Mixte de Recherche INRA/AgroM/CIRAD/IRD BDPPC, Equipe « Architecture et Fonctionnement des Espèces Fruitières", 2 place Pierre VIALA, 34060 Montpellier cedex 1, France Introduction For many years, plant architecture has been viewed as the result of repetitions (Barlow, 1994), which occur at different levels of organisation (metamers, growth units, axis and branching systems) (Barthélémy, 1991). In addition, these plant components were shown to be distributed within individuals according to precise gradients (Barthélémy et al., 1997). The changes which occur during plant ontogeny have been described along axis for successive entities and according to their position for the lateral ones. These changes reflect the impact of the plant topology, seen as a tree structure, on the plant entities. They occur in various plant species, for which the nature of the botanical entities and that of their successors tend to be equivalent, whereas branching tends to induce marked qualitative changes between the bearing entity and the borne branching system. However, the intensity of these changes has not yet been quantified, especially for comparing successive entities with lateral ones. We aim at characterising these changes by diverse quantitative or qualitative variables attached to a given entity such as the number of nodes, the length, the diameter and the presence/absence of flowering. These variables are called the entity attributes. Connected entities having similar attributes can be interpreted as homogeneous zones, as opposed to ruptures or transitions between zones. For example flowering is a factor of rupture in the plant architecture when the meristem death leads to sympodial branching. The discrimination between dominating and dominated axis in plants with different degrees of hierarchy can be formulated as the research of ruptures and continuities. More generally it makes sense to identify zones when the entities at a given scale can be clearly classified into a small number of classes defined by different morphological and functional characters. This is the case for various plant species where such a categorisation holds for the meristem functioning modes, in which only definite plant entities can be synthesised. For a given meristem, these modes are chronologically ordered and the order does not depend on the meristem. These ordered modes correspond to the notion of physiological age (Barthélémy et al., 1997) describing the stage of differentiation of meristems. The physiological age of meristems can be assessed only indirectly; it is deduced from some of the biological characteristics of the plant, which are supposed to have an impact on the measured attributes. A statistical approach is relevant for the analysis of architectural data, both for the exploratory analysis and for inferring some regularities or structures not directly apparent in the data. In our case, the aim of statistical models is to characterise such latent structures. These models are intended to make explicit some regularity, patterns or levels of organisation from the attributes, tree-structured zones for instance. The statistical analysis of sequential data of plant architecture, illustrated in (Guédon et al., 2001), is mainly based on Markovian models, for instance hidden semi-Markov chains for modelling homogeneous zones. These models, though accurately accounting for the structure contained along remarkable paths in the plant (e.g. a tree trunk), are not relevant for identifying tree- structured zones, since the dependencies between entities of disjoint sequences are eluded. The complete topology has to be somehow included into the model for the existence of multiple dependent successors (or descendants) to be considered in the zone distribution. We propose to use the statistical framework of the hidden Markov trees (HMTs) introduced by (Crouse et al., 1998) n the context of signal processing to efficiently model homogeneous zones within a tree-structured process whose topology, fixed by the data, is thus non-random. The distribution of the vertex attributes is determined in HMTs by the value of a discrete hidden state. The persistence of these hidden states, leading to homogenous zones, is obtained by defining local dependencies between them. The HMT modelling is complementary with the plant comparison