Vision-based Vineyard Trunk Detection and its Integration into a Grapes Harvesting Robot Eftichia Badeka, Theofanis Kalampokas, Eleni Vrochidou, Konstantinos Tziridis, George A. Papakostas, Theodore P. Pachidis, Vassilis G. Kaburlasos Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic University (IHU), Kavala, Greece Email: {evbadek, theokala, evrochid, kenaaske, gpapak, pated, vgkabs}@cs.ihu.gr AbstractIn this work, deep learning is employed for accurate and fast detection of vine trunks in vineyard images. More specifically, six well-known object detectors, Faster regions-convolutional neural network (Faster R- CNN), You Only Look Once version 3 (YOLOv3) and version 5 (YOLOv5), EfficientDet-D0, RetinaNet and MobilNet, are tested for real-time vine trunk detection. The models are trained with an in-house dataset designed for the needs of this study, containing 1927 manually annotated vine trunks in 899 different images. Comparative results indicate EfficientDet-D0 as the configuration that allows the faster and most accurate vine trunk detection, achieving Intersection over Union (IU) of 71% and overall Average Precision of 77.9% in 38 ms. The high precision combined with the fast runtime performance, indicate EfficientDet-D0 detector as the most suitable to be integrated into an autonomous harvesting robot for real-time vine trunk detection. Index Termsobject detection, harvesting robot, deep learning, trunk detection, computer vision, precision agriculture, Cyber-Physical System (CPS) I. INTRODUCTION Wine industry has greatly developed in the last few decades [1]. In wine industry, eonologists seek to maximize the quality of the harvested grapes, while field managers try to minimize all operational costs. These two opposite objectives are met in the implementation of viticultural practices; on one hand, there are the annual canopy management practices aiming at maintaining and improving vineyards’ health, leading to optimized wine quality, while on the other hand, there is the mechanization of these practices by agricultural robots, namely agrobots, aiming at reducing all labor costs [2]. Agrobots are capable of a longer duration of work, as an autonomous and automatic robot may outlast a human worker, increase productivity, application accuracy and operation safety [3]. In the aforementioned context, agrobots are adopted to perform a variety of vineyard management practices, including pruning, defoliation or green harvest [4]. Our interest here is in the development of an autonomous robot for grape harvest, namely ARG, able to support viticulture tasks such as harvest, cluster thinning (green harvest) and basel leaves removal Manuscript received August 10, 2020; revised January 12, 2021. (defoliation) [5]. ARG is designed as a Cyber-Physical system (CPS), integrating intelligence, communication and functionality, towards sensor awareness and decision making. In this context, ARG needs to navigate in the vineyard and to detect the vine trees so as to perform the selected viticulture tasks. In vineyard, especially in those build in steep slope hills, there are several challenges regarding robots navigation and localization, mainly due to terrain irregularities and inaccuracies of the signals emitted by the global navigation satellite system (GNSS), which is usually used for these purposes. Feature-based localization, i.e. extraction of reliable and persistent features or landmarks from vineyards, is therefore considered. Knowledge about the vineyards patterns is currently the most accurate, cheap and fast solution to facilitate agricultural tasks that need to be precise. Vine trunks can be selected as stable landmarks that exist in all vineyards. It makes sense to provide the robot with the ability to recognize vine trunks as high-level features of vineyards, to use in localization and mapping procedures. More analytically, detection of the vine trunks can help in building a precise vineyard map that the agricultural robot may rely on, to navigate safely and perform a wide range of agricultural tasks. Moreover, locating the vine trunk is the first step to automatically control the position and orientation of the robot in order to execute basel defoliation, and to center on the vine to perform harvest or green harvest evenly spaced on both sides. Therefore, vine trunks need to be located precisely for two main reasons: 1) to facilitate the navigation of ARG in the vineyard corridors and 2) to locate the working point of ARG regarding the performance of the selected viticulture tasks. The problem of vine trunk detection is challenging due to the fact that during both basel defoliation and green harvest season, vineyard corridors and vine trunks are occluded by shoots and leaves [6], making it difficult to determine either the vineyard corridors or to discriminate the vine trunks (Fig. 1.). 374 International Journal of Mechanical Engineering and Robotics Research Vol. 10, No. 7, July 2021 © 2021 Int. J. Mech. Eng. Rob. Res doi: 10.18178/ijmerr.10.7.374-385