Please cite this article in press as: Levati ´ c, J., et al., Community structure models are improved by exploiting taxonomic rank with
predictive clustering trees. Ecol. Model. (2014), http://dx.doi.org/10.1016/j.ecolmodel.2014.10.023
ARTICLE IN PRESS
G Model
ECOMOD-7356; No. of Pages 11
Ecological Modelling xxx (2014) xxx–xxx
Contents lists available at ScienceDirect
Ecological Modelling
journa l h om epa ge: www.elsevier.com/locate/ecolmodel
Community structure models are improved by exploiting taxonomic
rank with predictive clustering trees
Jurica Levati ´ c
a,b,∗
, Dragi Kocev
a
, Marko Debeljak
a,b
, Saˇ so Dˇ zeroski
a,b
a
Department of Knowledge Technologies, Joˇ zef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia
b
Joˇ zef Stefan International Postgraduate School, Jamova cesta 39, 1000 Ljubljana, Slovenia
a r t i c l e i n f o
Article history:
Available online xxx
Keywords:
Community structure modelling
Taxonomic rank
Predictive clustering trees
Classification
Hierarchical multi-label classification
a b s t r a c t
Community structure modelling studies the influence of biotic and abiotic factors on the abundance and
composition of a given taxonomic group of organisms. With the advancement of measurement and sen-
sor technology, the availability, precision and complexity of environmental data constantly increases.
Nowadays, measurements of ecosystems provide a complete snapshot of the state of the system, includ-
ing information about the community structure of organisms that are present in a given sample. These
measurements include multi-species data that are typically analysed by constructing community models
as collections of models built for each species separately (local models) without considering the possible
(taxonomic) relationships among species.
In this work, we propose to construct a single community structure model for all the species (global
model) that is able to exploit the aforementioned relationships. Namely, we investigate whether inclu-
sion of additional information in the form of taxonomic rank or multiple species helps to build better
community structure models. More specifically, we use predictive clustering trees (a generalized form
of decision trees) to build models for three practically relevant datasets from the task of community
structure modelling: microarthopod community living in the agricultural soils of Denmark, organisms
living in Slovenian rivers and vegetation found in the State of Victoria, Australia.
On each dataset, we compare the performance of four types of community structure models, which
correspond to four machine learning tasks: Single species models without taxonomic rank correspond
to single-label classification; single species models with taxonomic rank correspond to hierarchical
single-label classification; multi-species models without taxonomic rank correspond to multi-label
classification; and multi-species models with taxonomic rank correspond to hierarchical multi-label
classification. The results of the experimental evaluation reveal that by using the taxonomic rank and the
multi-species aspect of the data, we are able to learn better community structure models.
© 2014 Elsevier B.V. All rights reserved.
1. Introduction
One of the most fundamental questions in ecology is: What is the
composition of a community of organisms with respect to the envi-
ronment? Prediction of such composition through modelling of the
community structure answers this question when empirical evi-
dence is not attainable. The community is an assemblage of species
populations that occur together in space and time. The species that
assemble a community are determined by dispersal constraints,
abiotic environmental constraints and biotic interactions (Belyea
∗
Corresponding author at: Department of Knowledge Technologies, Joˇ zef Stefan
Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. Tel.: +386 1 477 3639.
E-mail addresses: jurica.levatic@ijs.si (J. Levati ´ c), dragi.kocev@ijs.si (D. Kocev),
marko.debeljak@ijs.si (M. Debeljak), saso.dzeroski@ijs.si (S. Dˇ zeroski).
and Lancaster, 1999). To reflect these different constraints, the
terms dispersal assembly rules, abiotic assembly rules, and biotic
assembly rules are used, respectively (Götzenberger et al., 2012).
Community ecology uses the assembly rules approach to investi-
gate the mechanisms that structure biological communities. The
objective of assembly rules is to predict species composition in a
specified habitat dominated by a set of environmental conditions:
(1) to simply predict the presence or absence of species; and (2) to
predict the abundance of species (Keddy, 1992; Weiher and Keddy,
2001; Götzenberger et al., 2012).
Abiotic assembly rules are studied with gradient analysis. The
choice of the gradients/factors (i.e., environmental variables) can
be subjective, because it is based on existing knowledge about the
studied species (e.g., elevation and precipitation are gradients for
forest communities) and the availability of data about these species
that are organised along the gradient of a factor. The fact that the
http://dx.doi.org/10.1016/j.ecolmodel.2014.10.023
0304-3800/© 2014 Elsevier B.V. All rights reserved.