Auton Robot (2009) 26: 171–186 DOI 10.1007/s10514-009-9114-2 Clustering sensor data for autonomous terrain identification using time-dependency Philippe Giguere · Gregory Dudek Received: 7 November 2008 / Accepted: 6 March 2009 / Published online: 19 March 2009 © Springer Science+Business Media, LLC 2009 Abstract In this paper we are interested in autonomous ve- hicles that can automatically develop terrain classifiers with- out human interaction or feedback. A key issue is the cluster- ing of time-series data collected by the sensors of a ground- based vehicle moving over several terrain surfaces (e.g. con- crete or soil). In this context, we present a novel off-line win- dowless clustering algorithm that exploits time-dependency between samples. In terrain coverage, sets of sensory mea- surements are returned that are spatially, and hence tempo- rally, correlated. Our algorithm works by finding a set of pa- rameter values for a user-specified classifier that minimize a cost function. This cost function is related to the change in classifier probability estimates over time. The main ad- vantage over other existing methods is its ability to cluster data for fast-switching systems that either have high process or observation noise, or complex distributions that cannot be properly characterized within the time interval that the system stays in a single state. The algorithm was evaluated using three different classifiers (linear separator, mixture of Gaussians and k-Nearest Neighbor), over both synthetic data sets and two different mobile robotic platforms, with suc- cess. Comparisons are provided against a window-based al- gorithm and against a hidden Markov model trained with Expectation-Maximization, with positive results. Keywords Terrain identification · Unsupervised learning · Clustering · Mobile robots · Legged robots · Machine learning · Hidden Markov model P. Giguere () · G. Dudek Centre for Intelligent Machines, McGill University, Montreal, Quebec, Canada H3A 2A7 e-mail: philg@cim.mcgill.ca G. Dudek e-mail: dudek@cim.mcgill.ca 1 Introduction Identifying the local terrain properties has recently become a problem of increasing interest and relevance for unmanned ground vehicles. This has been proposed with both non- contact sensors, as well as using tactile feedback. Being able to identify terrain types is important, its properties directly affecting the navigability, odometry and localization perfor- mance of such vehicles. As part of our research, we are in- terested in using simple sensors such as accelerometers and actuator feedback information to help discover and identify terrain type autonomously. Real terrains can vary widely— contact forces vary also with locomotion strategies (or gait, for a legged vehicle)—making the sensors response difficult to model and predict analytically. Therefore, this problem seems well suited to statistical data-driven approaches. We approach the problem using unsupervised learning (clustering) of samples which represent sequences of con- secutive measurement from the vehicle as it travels, per- haps moving from one terrain type to another. Since those signals are generated through a physical system interact- ing with a continuous or piece-wise continuous terrain, time-dependency will be present between consecutive sam- ples. The clustering algorithm we are proposing explic- itly exploits this time-dependency. It is a single-stage batch method that finds the global description of a cluster, con- trary to moving time-window methods that detects transi- tion through a local description of the distributions estimated within this moving window. The algorithm has been devel- oped for noisy systems (i.e., systems with overlapping clus- ters), as well as for systems that change state frequently (e.g., a vehicle traversing different terrain types in quick suc- cession). The paper is organized as follow. In Sect. 2, we present an overview of related work on the subject, pointing out some