Can You Pick a Broccoli? 3D-Vision Based Detection and Localisation of Broccoli Heads in the Field Keerthy Kusumam Tom´ aˇ s Krajn´ ık Simon Pearson Grzegorz Cielniak Tom Duckett Abstract— This paper presents a 3D vision system for robotic harvesting of broccoli using low-cost RGB-D sensors. The presented method addresses the tasks of detecting mature broccoli heads in the field and providing their 3D locations relative to the vehicle. The paper evaluates different 3D features, machine learning and temporal filtering methods for detection of broccoli heads. Our experiments show that a combination of Viewpoint Feature Histograms, Support Vector Machine classifier and a temporal filter to track the detected heads results in a system that detects broccoli heads with 95.2% precision. We also show that the temporal filtering can be used to generate a 3D map of the broccoli head positions in the field. Index Terms— robotic vision, RGB-D sensing, field robotics, automated harvesting I. I NTRODUCTION Sustainable intensification of agriculture can be achieved through various technological innovations such as automated harvesting. Automated harvesting approaches bring benefits of reduced labour costs, economic sustainability, higher productivity, less waste and better use of natural resources. Selective harvesting methods choose only mature crops for harvesting, as opposed to “slaughter harvesting” where an entire field is harvested in a single pass. Broccoli is an instance of the crops that demand selective harvesting since the flowers exhibit a high variation in maturity levels, even when grown in the same field. To address these challenges, an automated selective harvesting machine would require an intelligent vision sensing unit that can detect and locate the harvestable broccoli heads. However, such systems encounter several difficulties arising from the natural variations, partial views of the heads and occlusions due to leaves and weeds. The main objective of this paper is to investigate the feasibility of using low-cost consumer 3D cameras to identify mature broccoli heads in real, unstructured outdoor field conditions, providing the locations of the detected heads in 3D image coordinates. The presented method applies state- of-the art 3D feature extraction methods, machine learning, and temporal filtering to remove false positives and track the detected heads. Future work will address the problems of measuring the size of the detected broccoli heads to deter- mine when a head is ready for harvest and the development of a cutting mechanism to physically harvest the crop. The paper evaluates different 3D features, machine learn- ing and temporal filtering methods for detection of mature broccoli heads. We show that a combination of Viewpoint All authors are with the University of Lincoln, UK. The work was funded by BBSRC and Innovate UK, project BB/N004841/1. Many thanks to Adam Turner for all his help with ground truthing the datasets used in this paper. -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0 1 2 3 4 5 6 y [m] x [m] Partial map of the detected broccolli heads Fig. 1: System overview. Top: Tractor equipped with 3D sensors used for field data collection. Middle: RGB-D images of broccoli plants (left) are analysed for identifying head locations using 3D recognition algorithms (right). Bottom: Temporal filtering then combines detections from multiple frames to localise individual broccoli heads. Feature Histogram (VFH) and Support Vector Machine (SVM) allows to detect the broccoli heads with 94.7% precision. Moreover, we demonstrate that the integration of detection results across multiple frames (temporal filtering) allows to prune false positive detections, further improving the precision to 95.2%. We also demonstrate that the tempo- ral filtering can be used to generate a 3D map of the broccoli head positions in the field. II. RELATED WORK Several approaches can be identified in precision agricul- ture for detection, recognition, localisation and harvesting of different crop varieties. Analysis of 2D images acquired from high resolution, industrial grade cameras such as CCD is one of the most prominent approaches as shown in [1],[2]. Several methods use colour analysis such as the strawberry picking robot in [3] and apple harvesting using colour and shape features in [4]. Okamoto et al. [5] developed a citrus