A Self-supervised Architecture for Moving Obstacles Classification Roman Katz, Bertrand Douillard, Juan Nieto and Eduardo Nebot ARC Centre of Excellence for Autonomous Systems Australian Centre for Field Robotics The University of Sydney Sydney, NSW 2006, Australia {r.katz}@cas.edu.au Abstract— This work introduces a self-supervised, multi- sensor architecture that performs automatic moving obstacles classification. Our approach presents a hierarchical scheme that relies on the “stability” of a subset of features given by a sensor to perform an initial robust classification based on unsupervised techniques. The obtained results are used as labels to train a set of supervised classifiers, which can be then combined to improve the final classification accuracy. The proposed architecture is general and can be indeed instantiated in a variety of ways, using different sensors and classifiers. The applicability and validity of the proposed architecture is evaluated for a particular realization based on range and visual information that achieves 83% accuracy without using manually labeled data. Experimental results also demonstrate how accuracy can be maintaned through self-training capabilities when working conditions change. I. I NTRODUCTION Accurate classification of dynamic obstacles from a mov- ing vehicle is a vital component in any architecture developed to achieve some kind of autonomy (e.g. urban Darpa Grand challenge [1]), or provide situation awareness information to drivers [2]. In either case, the classes of the involved agents usually determine different responses or levels of assessment related to the situation. This could be, for instance, by integrating the class information in the global navigation architecture such that different classes might be used ade- quately in obstacle avoidance, mapping or tracking modules. Moreover, these classes can be used to trigger/manage the corresponding alarms or action in driver assistance systems for commercial cars. Perception and classification of moving obstacles in urban environments, is a particularly challenging task. Several different reasons contribute to this. First of all, there might be many different obstacles (with particular dynamics) involved in this scenario, such as pedestrians, cars, trucks, bikes, etc. The observer should be able to first detect these various agents from different positions/angles and cope with occlu- sion, while considering at the same time its own dynamics. Environmental conditions could detriment the performance not only in terms of perception, but also in the classification. Even when a classifier might be well posed for some particular scenario (for the one it has been trained), changing conditions might indeed affect its accuracy while acting beyond the range where it can generalize. In this work, we present our progress towards the de- ployment of a self-supervised, multi-sensor architecture to perform automatic moving obstacles classification. One of the main goals of sensor fusion in robotics, is to be able to integrate complementary characteristics of different sen- sors/techniques in order to improve the overall performance of a system. This is one of the principles of this work, to fuse one sensor particularly well suited to perform a task, together with others that compliment the architecture and provide more robustness. The second core idea in this work is the necessity to have an unsupervised module in our architecture. This is not only motivated by the constraints linked to hand labeling in supervised techniques, but also by the need of a classifier which adapts internal models when the working conditions change beyond its generalization capabilities. We propose a hierarchical architecture that relies on the “stability” of a subset of features given by a sensor to perform an initial robust classification based on unsupervised techniques. The obtained results are then used as labels to train a set of supervised classifiers that are finally combined to improve the final classification accuracy. The architecture is general and can be indeed instantiated in a variety of ways, using different sensors and classifiers. In this work, however, we demonstrate the applicability and validity of the concepts for a particular instance using range and visual information. Laser information is first processed conveniently, and labels are obtained in an unsupervised manner. The generated labels train a set of supervised classifiers, based on the laser itself and on visual information from a monocular color camera. The structure of this paper is as follows. We first present related work in Section II. Our general self-supervised ap- proach is introduced in Section III, where details of the pro- posed architecture for a particular instance based on laser and visual information are described. Section IV illustrates the performance of the system through experiments. Conclusions and future work are finally discussed in Section V. II. RELATED WORK There has been extensive research on object classification in machine learning and robotics, in terms of supervised and unsupervised classifiers, and their different possible combi- nations. Supervised approaches are usually more accurate than unsupervised ones because the training data contain the input vectors, together with the corresponding labels. Unsupervised classification, on the other hand, does not use any labeling information and must act only based on the input vectors, usually clustering groups within the data [3], [4].