CIGR-AgEng conference Jun. 26–29, 2016, Aarhus, Denmark ∙ 1 ∙ Semantic Segmentation of Mixed Crops using Deep Convolutional Neural Network Anders Krogh Mortensen a,* , Mads Dyrmann b , Henrik Karstoft c , Rasmus Nyholm Jørgensen c , René Gislum d a Department of Agroecology, Aarhus University, Aarhus, Denmark b The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark c Department of Engineering, Aarhus University, Aarhus, Denmark d Department of Agroecology, Aarhus University, Slagelse, Denmark * Corresponding author. Email: anmo@agro.au.dk Abstract Estimation of in-field biomass and crop composition is important for both farmers and researchers. Using close-up high resolution images of the crops, crop species can be distinguished using image processing. In the current study, deep convolutional neural networks for semantic segmentation (or pixel-wise classification) of cluttered classes in RGB images was explored in case of catch crops and volunteer barley cereal. The dataset consisted of RGB images from a plot trial using oil radish as catch crops in barley. The images were captured using a high-end consumer camera mounted on a tractor. The images were manually annotated in 7 classes: oil radish, barley, weed, stump, soil, equipment and unknown. Data argumentation was used to artificially increase the dataset by transposing and flipping the images. A modified version of VGG-16 deep neural network was used. First, the last fully-connected layers were converted to convolutional layer and the depth was modified to cope with our number of classes. Secondly, a deconvolutional layer with a 32 stride was added between the last fully-connected layer and the softmax classification layer to ensure that the output layer has the same size as the input. Preliminary results using this network show a pixel accuracy of 79% and a frequency weighted intersection over union of 66%. These preliminary results indicate great potential in deep convolutional networks for segmentation of plant species in cluttered RGB images. Keywords: Pixel-wise classification, deep learning, computer vision 1. Introduction Estimating total crop biomass or individual crop components in a mixed cropping system is important for agricultural research and farming. For example the assessment of either grass and/or clover biomass in a forage based feed system can be used to optimize animal feeding plans. Moreover the quantification of autumn catch crop biomass coupled with nitrogen (N) concentration will make it possible to estimate the N uptake. Nitrogen concentration can be rapidly measured through the relationship with chlorophyll content. Information of N uptake in autumn catch crops within and between fields can be used to differentiate spring N application(s) in the following crop; which should facilitate a higher N utilization and possibly reduce N leaching to ground and surface water. Mounting a camera on either an unmanned aerial vehicle or a tractor can provide the farmer or researcher high resolution close-up images of the field and its crops. Processed, these images can give an estimation of the biomass and N-uptake in the field (Mortensen et al., 2015). In a field with mixed crops, an important first step in the process of estimating either total crop or individual crop component biomass is the ability to distinguish the different crop components. Current methods are traditionally based on hand crafted features and/or morphology and are as a result either very slow (Mortensen et al., 2015) or very low capacity (Himstedt, 2009). Recent development in deep learning and in particular deep convolutional neural networks have shown impressive results in tasks such as image classification, speech recognition and lately in semantic segmentation (or pixel-wise classification). In this paper, we explore convolutional neural networks for semantic segmentation in the context of mixed crops. In contrast to regular scenes, images of mixed crops are often much more complex and cluttered, but with fewer classes. We used a state-of-the-art convolutional network architecture for semantic segmentation (Long et al., 2015) to explore the potential and challenges in this case. The performance was evaluated on images of an oil radish plot trial with barley, grass, weed, stump and soil. 2. Materials and Methods 2.1. Image acquicition The dataset consists of images, which were acquired from a plot experiment at Foulum Research Center, Denmark. The full plot experiment consisted of 36 plots (9 treatments with 4 repetitions), but only the repetitions of one of the treatments was photographed. The photographed plots (3 m x 15 m) contained oil radish as a catch crop and some amounts of voluntarily seeded barley, grass, weed and stump. The plots were photographed approximately every week over a period of 8 weeks. A Sony a7 with a 35 mm lens was used to photograph the plots. The camera was mounted on a