Spatio-temporal data classification through multidimensional sequential patterns: Application to crop mapping in complex landscape Yoann Pitarch a , Dino Ienco c,d,n , Elodie Vintrou b,d , Agnès Bégué b,d , Anne Laurent e , Pascal Poncelet e , Michel Sala e , Maguelonne Teisseire c,d a IRIT, Toulouse, France b CIRAD, France c Irstea, France d UMR TETIS - 500 rue Jean-François Breton, 34093 Montpellier, France e LIRMM - CNRS - UM2, France article info Article history: Received 20 August 2013 Received in revised form 8 July 2014 Accepted 2 September 2014 Available online 29 September 2014 Keywords: Knowledge discovery Data mining MODIS images Remote sensing Land cover abstract The main use of satellite imagery concerns the process of the spectral and spatial dimensions of the data. However, to extract useful information, the temporal dimension also has to be accounted for which increases the complexity of the problem. For this reason, there is a need for suitable data mining techniques for this source of data. In this work, we developed a data mining methodology to extract multidimensional sequential patterns to characterize temporal behaviors. We then used the extracted multidimensional sequences to build a classifier, and show how the patterns help to distinguish between the classes. We evaluated our technique using a real-world dataset containing information about land use in Mali (West Africa) to automatically recognize if an area is cultivated or not. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction This work was motivated by a real-world problem in which the final goal is to monitor areas that are not easy to access in order to perform food risk analysis. The northern fringe of sub-Saharan Africa (Sahel belt) is considered to be particularly vulnerable to climate variability and change, which is why food security remains a major issue in this region (Lobell et al., 2008). One of the preliminary stages necessary for analyzing climate variabilities on agriculture is a reliable estimation of the cultivated land at a national scale. To perform this task, we need to know whether data from several sources (e.g. field surveys, climate, satellite images) can provide a correct assessment of the distribution of cultivated land at a national scale. Although data from satellite images are very useful for monitoring land surface, the large quantity of spatio- spectro-temporal measurements stored by the instruments limits the usefulness as sources of information. In recent years, research on spatio-temporal databases has consequently increased alongside research on mining such data (Bogorny and Shekhar, 2010). In our analysis, the main problem is to combine heterogeneous sources of information (temporal and static information) to exploit the full set of dimensions without losing information. To this end, we first need to combine multidimensional temporal and static data. Second, we need a model from which the analyst can easily obtain a clear explanation, since, many previous models are black box models that do not provide a useful explanation for the classification (Qin and Obradovic, 2006). We thus developed a data mining methodology to extract relevant multidimensional sequential patterns from both static descriptions and temporal behaviors. In data mining literature the term multidimensional is used as a synonym for multi-attribute data that, instead, is commonly employed by researchers in statistics. In the rest of the paper we mainly use multidimensional as the proposed approach comes from the data mining field. In our scenario, multidimensional (or multi-attribute) temporal measurements are obtained from moderate resolution remote sensing images. In the second step, we exploit the acquired know- ledge (in terms of sequential patterns) to build a classifier to distinguish between cultivated and non-cultivated areas. Classifi- cation techniques based on frequent patterns have been adopted in Zaiane et al. (2002) and Chien and Chen (2010). Nevertheless, since these approaches are based on association rules, the tem- poral aspect is not taken into account within the classification Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Artificial Intelligence http://dx.doi.org/10.1016/j.engappai.2014.09.001 0952-1976/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. Engineering Applications of Artificial Intelligence 37 (2015) 91–102