Received: 20 November 2019 | Revised: 27 April 2020 | Accepted: 14 June 2020 DOI: 10.1002/rob.21975 REGULAR ARTICLE The effect of data augmentation and network simplification on the imagebased detection of broccoli heads with Mask RCNN Pieter M. Blok 1,2 | Frits K. van Evert 1 | Antonius P. M. Tielen 3 | Eldert J. van Henten 2 | Gert Kootstra 2 1 Agrosystems Research, Wageningen University & Research, Wageningen, The Netherlands 2 Farm Technology Group, Wageningen University & Research, Wageningen, The Netherlands 3 Greenhouse Horticulture, Wageningen University & Research, Wageningen, The Netherlands Correspondence Pieter M. Blok, Agrosystems Research, Wageningen University & Research, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands. Email: pieter.blok@wur.nl Funding information Tony Wisdom (Skagit Valley Farm) Abstract In current practice, broccoli heads are selectively harvested by hand. The goal of our work is to develop a robot that can selectively harvest broccoli heads, thereby reducing labor costs. An essential element of such a robot is an imageprocessing algorithm that can detect broccoli heads. In this study, we developed a deep learning algorithm for this purpose, using the Mask Regionbased Convolutional Neural Network. To be applied on a robot, the algorithm must detect broccoli heads from any cultivar, meaning that it can generalize on the broccoli images. We hypothesized that our algorithm can be generalized through network simplification and data augmentation. We found that network simplification decreased the generalization performance, whereas data augmentation increased the generalization perfor- mance. In data augmentation, the geometric transformations (rotation, cropping, and scaling) led to a better image generalization than the photometric transfor- mations (light, color, and texture). Furthermore, the algorithm was generalized on a broccoli cultivar when 5% of the training images were images of that cultivar. Our algorithm detected 229 of the 232 harvestable broccoli heads from three cultivars. We also tested our algorithm on an online broccoli data set, which our algorithm was not previously trained on. On this data set, our algorithm detected 175 of the 176 harvestable broccoli heads, proving that the algorithm was successfully gen- eralized. Finally, we performed a costbenefit analysis for a robot equipped with our algorithm. We concluded that the robot was more profitable than the human har- vest and that our algorithm provided a sufficient basis for robot commercialization. KEYWORDS agriculture, computer vision, learning, perception, sensors 1 | INTRODUCTION In agriculture, numerous tasks depend on human labor. This labor is getting more expensive and more scarce, which causes problems for tasks that are done by hand, such as the selective harvest of crops. Selective handharvest involves the visual assessment of the crop, followed by the harvest of only those specimens that have reached the desired size, quality, or maturity. A crop that is selectively har- vested by hand, is broccoli (Brassica oleracea var. italica). In the Netherlands, broccoli is usually handharvested three times in one growing season (Kwin, 2018). Cost studies show that the hand harvest of broccoli can take up to 107 manhours per hectare and J Field Robotics. 2020;120. wileyonlinelibrary.com/journal/rob © 2020 Wiley Periodicals LLC | 1