Vision-based Vineyard Trunk Detection and its
Integration into a Grapes Harvesting Robot
Eftichia Badeka, Theofanis Kalampokas, Eleni Vrochidou, Konstantinos Tziridis, George A. Papakostas,
Theodore P. Pachidis, Vassilis G. Kaburlasos
Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic
University (IHU), Kavala, Greece
Email: {evbadek, theokala, evrochid, kenaaske, gpapak, pated, vgkabs}@cs.ihu.gr
Abstract—In this work, deep learning is employed for
accurate and fast detection of vine trunks in vineyard
images. More specifically, six well-known object detectors,
Faster regions-convolutional neural network (Faster R-
CNN), You Only Look Once version 3 (YOLOv3) and
version 5 (YOLOv5), EfficientDet-D0, RetinaNet and
MobilNet, are tested for real-time vine trunk detection. The
models are trained with an in-house dataset designed for the
needs of this study, containing 1927 manually annotated
vine trunks in 899 different images. Comparative results
indicate EfficientDet-D0 as the configuration that allows the
faster and most accurate vine trunk detection, achieving
Intersection over Union (IU) of 71% and overall Average
Precision of 77.9% in 38 ms. The high precision combined
with the fast runtime performance, indicate EfficientDet-D0
detector as the most suitable to be integrated into an
autonomous harvesting robot for real-time vine trunk
detection.
Index Terms—object detection, harvesting robot, deep
learning, trunk detection, computer vision, precision
agriculture, Cyber-Physical System (CPS)
I. INTRODUCTION
Wine industry has greatly developed in the last few
decades [1]. In wine industry, eonologists seek to
maximize the quality of the harvested grapes, while field
managers try to minimize all operational costs. These two
opposite objectives are met in the implementation of
viticultural practices; on one hand, there are the annual
canopy management practices aiming at maintaining and
improving vineyards’ health, leading to optimized wine
quality, while on the other hand, there is the
mechanization of these practices by agricultural robots,
namely agrobots, aiming at reducing all labor costs [2].
Agrobots are capable of a longer duration of work, as
an autonomous and automatic robot may outlast a human
worker, increase productivity, application accuracy and
operation safety [3]. In the aforementioned context,
agrobots are adopted to perform a variety of vineyard
management practices, including pruning, defoliation or
green harvest [4]. Our interest here is in the development
of an autonomous robot for grape harvest, namely ARG,
able to support viticulture tasks such as harvest, cluster
thinning (green harvest) and basel leaves removal
Manuscript received August 10, 2020; revised January 12, 2021.
(defoliation) [5]. ARG is designed as a Cyber-Physical
system (CPS), integrating intelligence, communication
and functionality, towards sensor awareness and decision
making. In this context, ARG needs to navigate in the
vineyard and to detect the vine trees so as to perform the
selected viticulture tasks.
In vineyard, especially in those build in steep slope
hills, there are several challenges regarding robots’
navigation and localization, mainly due to terrain
irregularities and inaccuracies of the signals emitted by
the global navigation satellite system (GNSS), which is
usually used for these purposes. Feature-based
localization, i.e. extraction of reliable and persistent
features or landmarks from vineyards, is therefore
considered. Knowledge about the vineyards patterns is
currently the most accurate, cheap and fast solution to
facilitate agricultural tasks that need to be precise. Vine
trunks can be selected as stable landmarks that exist in all
vineyards. It makes sense to provide the robot with the
ability to recognize vine trunks as high-level features of
vineyards, to use in localization and mapping procedures.
More analytically, detection of the vine trunks can help in
building a precise vineyard map that the agricultural
robot may rely on, to navigate safely and perform a wide
range of agricultural tasks. Moreover, locating the vine
trunk is the first step to automatically control the position
and orientation of the robot in order to execute basel
defoliation, and to center on the vine to perform harvest
or green harvest evenly spaced on both sides. Therefore,
vine trunks need to be located precisely for two main
reasons: 1) to facilitate the navigation of ARG in the
vineyard corridors and 2) to locate the working point of
ARG regarding the performance of the selected
viticulture tasks.
The problem of vine trunk detection is challenging due
to the fact that during both basel defoliation and green
harvest season, vineyard corridors and vine trunks are
occluded by shoots and leaves [6], making it difficult to
determine either the vineyard corridors or to discriminate
the vine trunks (Fig. 1.).
374
International Journal of Mechanical Engineering and Robotics Research Vol. 10, No. 7, July 2021
© 2021 Int. J. Mech. Eng. Rob. Res
doi: 10.18178/ijmerr.10.7.374-385