ELSEVIER Image and Vision Computing 14 (1996) 353-363 Structure from motion techniques applied to crop field mapping J.M. Sanchiz”, F. Pla”, J.A. Marchantb, R. Brivotb zyxwvutsrqponmlkjihgfedcbaZYXW aDepartment of Computer Science, University Jaume I, 12071 Castellon, Spain bSilsoe Research Institute. W rest Park, Shoe. Beds h4K45 4HS, UK Received 10 July 1995; revised 24 October 1995 Abstract Some agricultural tasks performed in a crop field consist of applying chemical treatment to the plants. To automate these tasks, a vision system and a treating device can be used as a sensing device, mounted at the rear of an agricultural vehicle. The vision system tracks each plant in a sequence of images. With this arrangement the position of each plant with respect to the treating device can be derived, which specifies when and where to apply the chemical spray. A map of the field has to be recovered in the local area between the vision sensor and the treatment device. This implies recovering the motion parameters of the vehicle, recording the trajectory of the vehicle, and placing the plants in the map. A method to identify the plants, a shape description to match them and a tracking strategy are presented. The motion parameters are recovered from the plant correspondences, using a method to find the motion of a planar patch. A Kalman filter is used to integrate the different observations of each plant and to place them on the map. Results with real image sequences are presented, including zigzag and rotational movement. Keywords: Motion analysis; Tracking; Kalman filter 1. Introduction Some agricultural tasks performed in crop fields consist of applying chemical treatment to the plants. To automate these tasks, a vision system and a treating device could be mounted on the rear of an agricultural vehicle. While the vehicle moves, the vision system has to identify and track the plants, in order to apply the treatment accurately. The system could be mounted as a trolley towed by a manually driven vehicle, or it could be a part of an autonomous agricultural vehicle. In this case, the motion parameters recovered by the system would also be useful information for the guidance control system. The exact positions of the plants with respect to the vehicle would have to be known, which implies building a map of the field from the different observations of the plants with respect to the position of the vehicle. Information received from the tracking system would be used by a task planning module that would decide which plant has to be treated and when. Vehicle navigation has one of its practical applications in agricultural task automation. Vision-based guidance methods for vehicle navigation have been reported in recent years in the problems of road following [l-4] and indoor navigation [5,6]. The problem of analysing 0262-8856/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved SSDI 0262-8856(95)01070-X camera motion from a sequence of images has also been studied widely. In vehicle navigation applications or robot motion analysis, the scene is supposed to be static while the camera is moving. Feature-based methods use collections of matched features in pairs of consecu- tive images [7-121. Features are usually points or lines, and the matching between features is made using some similarity measurement, for example, correlation of a small window around corners. Camera motion analysis is closely related to the problem of the tracking of tokens, and structure from motion. The Kalman filter and the extended Kalman filter have been used to inte- grate the different observations of the features, and to estimate their positions or depths [ 13-161. In this application there are no man-made objects to focus on. Typical scenes we are dealing with consist of a piece of field with some plants, weeds and soil. First, a segmentation process is applied to separate these three classes. To get a good contrast between vegetation and soil, grey level images of the field are taken with an infrared filter, then the images are segmented with a grey level threshold [17]. Two-dimensional regions obtained from the segmentation process are the input data to the vision system. These regions form the plants. Since our main interest is to solve the practical