Roman Bruch*, Rüdiger Rudolf, Ralf Mikut, and Markus Reischl Evaluation of semi-supervised learning using sparse labeling to segment cell nuclei https://doi.org/10.1515/cdbme-2020-3103 Abstract: The analysis of microscopic images from cell cul- tures plays an important role in the development of drugs. The segmentation of such images is a basic step to extract the vi- able information on which further evaluation steps are build. Classical image processing pipelines often fail under heteroge- neous conditions. In the recent years deep neuronal networks gained attention due to their great potentials in image segmen- tation. One main pitfall of deep learning is often seen in the amount of labeled data required for training such models. Es- pecially for 3D images the process to generate such data is tedious and time consuming and thus seen as a possible rea- son for the lack of establishment of deep learning models for 3D data. Efforts have been made to minimize the time needed to create labeled training data or to reduce the amount of la- bels needed for training. In this paper we present a new semi- supervised training method for image segmentation of micro- scopic cell recordings based on an iterative approach utilizing unlabeled data during training. This method helps to further reduce the amount of labels required to effectively train deep learning models for image segmentation. By labeling less than one percent of the training data, a performance of 90% com- pared to a full annotation with 342 nuclei can be achieved. Keywords: Sparse labeling, Deep learning, Iterative training, Semi-supervised learning, Semantic segmentation 1 Introduction Cell cultures can be used to examine the effectiveness and se- lectivity of an anti-cancer drug without the need to sacrifice animals. A large part of such studies relies on the evaluation of microscopic images, since they offer a wealth of informa- tion. From basic matters like the proliferation of cells up to more advanced aspects like the state of individual cells, a lot of questions can be answered on cell imaging. *Corresponding author: Roman Bruch, Institute of Molecular and Cell Biology, Faculty of Biotechnology, Mannheim University of Applied Sciences, Mannheim, Germany, e-mail: r.bruch@hs-mannheim.de Rüdiger Rudolf, Institute of Molecular and Cell Biology, Faculty of Biotechnology, Mannheim University of Applied Sciences, Mannheim, Germany Ralf Mikut, Markus Reischl, Institute for Automation and Applied Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany To extract this information in a quantitative and objective manner algorithms are needed. The segmentation of nuclei in microscopic images is a fundamental step on which further ac- tions like cell counting or co-localization of other fluorescent markers depend on. Algorithms like built-in FIJI plugins [13], CellProfiler [9], Mathematica pipelines [14], TWANG [15] and XPIWIT [1] are well established for this task. Typically, these pipelines require properties such as object size and shape and therefore have to be reparameterized for different record- ing conditions or cell lines. In extreme cases, such as the seg- mentation of apoptotic cells, parameterization is not sufficient and special algorithms need to be designed [8]. In the recent years deep learning models like the U- Net [11] gained attention in the biological field due to their great modeling power: U-Nets can outperform classical seg- mentation methods [3], but therefore they need rich training data sets. Newly emerging 3D cell cultures represent the living or- ganism more closely than 2D cultures. Data sets are given as stacked image series. Furthermore, new difficulties such as decreasing brightness along the z-axis arise. Classical ap- proaches, robust against intensity fluctuations exist, but suffer from parameterization [15]. Deep learning methods like the U- Net can also be used for 3D data [12], but they lack in estab- lishment due to the effort to create 3D training data sets. The process of generating training data for 3D images is time con- suming and burdensome: Since the visualization is effectively limited to 2D slices it is hard to conceive the object dimen- sions which often leads to a loss of overview. In a full manual approach, each plane in the 3D image has to be labeled indi- vidually. It is also challenging to achieve consistent segment boarders over consecutive planes [6]. As an example, to label a single 12812832 image patch containing 200 nuclei with the help of an interactive la- beling method [16], a time of 7.5was needed. To create a training data set with only ten image patches a time of 75 would be needed. Thus, many methods were developed to reduce the 3D- labeling effort which can be divided in three major approaches: Interactive labeling [5, 16], weakly supervised learning [19] and artificial training data [2, 7, 17]. The goal of interactive labeling is to accelerate the annotation process by support- ing the user in a semi-automatic manner. Weakly supervised learning uses different annotations like point or scribble an- DE GRUYTER Current Directions in Biomedical Engineering 2020;6(3): 20203103 Open Access. © 2020 Roman Bruch et al., published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 License.