Please cite this article in press as: Levati ´ c, J., et al., Community structure models are improved by exploiting taxonomic rank with predictive clustering trees. Ecol. Model. (2014), http://dx.doi.org/10.1016/j.ecolmodel.2014.10.023 ARTICLE IN PRESS G Model ECOMOD-7356; No. of Pages 11 Ecological Modelling xxx (2014) xxx–xxx Contents lists available at ScienceDirect Ecological Modelling journa l h om epa ge: www.elsevier.com/locate/ecolmodel Community structure models are improved by exploiting taxonomic rank with predictive clustering trees Jurica Levati ´ c a,b, , Dragi Kocev a , Marko Debeljak a,b , Saˇ so zeroski a,b a Department of Knowledge Technologies, Joˇ zef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia b Joˇ zef Stefan International Postgraduate School, Jamova cesta 39, 1000 Ljubljana, Slovenia a r t i c l e i n f o Article history: Available online xxx Keywords: Community structure modelling Taxonomic rank Predictive clustering trees Classification Hierarchical multi-label classification a b s t r a c t Community structure modelling studies the influence of biotic and abiotic factors on the abundance and composition of a given taxonomic group of organisms. With the advancement of measurement and sen- sor technology, the availability, precision and complexity of environmental data constantly increases. Nowadays, measurements of ecosystems provide a complete snapshot of the state of the system, includ- ing information about the community structure of organisms that are present in a given sample. These measurements include multi-species data that are typically analysed by constructing community models as collections of models built for each species separately (local models) without considering the possible (taxonomic) relationships among species. In this work, we propose to construct a single community structure model for all the species (global model) that is able to exploit the aforementioned relationships. Namely, we investigate whether inclu- sion of additional information in the form of taxonomic rank or multiple species helps to build better community structure models. More specifically, we use predictive clustering trees (a generalized form of decision trees) to build models for three practically relevant datasets from the task of community structure modelling: microarthopod community living in the agricultural soils of Denmark, organisms living in Slovenian rivers and vegetation found in the State of Victoria, Australia. On each dataset, we compare the performance of four types of community structure models, which correspond to four machine learning tasks: Single species models without taxonomic rank correspond to single-label classification; single species models with taxonomic rank correspond to hierarchical single-label classification; multi-species models without taxonomic rank correspond to multi-label classification; and multi-species models with taxonomic rank correspond to hierarchical multi-label classification. The results of the experimental evaluation reveal that by using the taxonomic rank and the multi-species aspect of the data, we are able to learn better community structure models. © 2014 Elsevier B.V. All rights reserved. 1. Introduction One of the most fundamental questions in ecology is: What is the composition of a community of organisms with respect to the envi- ronment? Prediction of such composition through modelling of the community structure answers this question when empirical evi- dence is not attainable. The community is an assemblage of species populations that occur together in space and time. The species that assemble a community are determined by dispersal constraints, abiotic environmental constraints and biotic interactions (Belyea Corresponding author at: Department of Knowledge Technologies, Joˇ zef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. Tel.: +386 1 477 3639. E-mail addresses: jurica.levatic@ijs.si (J. Levati ´ c), dragi.kocev@ijs.si (D. Kocev), marko.debeljak@ijs.si (M. Debeljak), saso.dzeroski@ijs.si (S. zeroski). and Lancaster, 1999). To reflect these different constraints, the terms dispersal assembly rules, abiotic assembly rules, and biotic assembly rules are used, respectively (Götzenberger et al., 2012). Community ecology uses the assembly rules approach to investi- gate the mechanisms that structure biological communities. The objective of assembly rules is to predict species composition in a specified habitat dominated by a set of environmental conditions: (1) to simply predict the presence or absence of species; and (2) to predict the abundance of species (Keddy, 1992; Weiher and Keddy, 2001; Götzenberger et al., 2012). Abiotic assembly rules are studied with gradient analysis. The choice of the gradients/factors (i.e., environmental variables) can be subjective, because it is based on existing knowledge about the studied species (e.g., elevation and precipitation are gradients for forest communities) and the availability of data about these species that are organised along the gradient of a factor. The fact that the http://dx.doi.org/10.1016/j.ecolmodel.2014.10.023 0304-3800/© 2014 Elsevier B.V. All rights reserved.