3036 Microsc. Microanal. 28 (Suppl 1), 2022 doi:10.1017/S1431927622011345 © Microscopy Society of America 2022 Development of a Flexible Ensemble Classification System for Microscopy Tomas J. McIntee 1 , Mathieu Therezien 1 and Zachary E. Russell 1* 1. Ion Innovations, Boone, NC, United States. * Corresponding author: Zach.Russell@ion-innovations.com In the present work, a highly-flexible ensemble classification system is proposed for broad use in automatic identification of objects in microscopy applications. While a number of plug-and-play solutions for machine learning and object classification exist, this platform is set apart due to being specifically tuned for microscopy applications. This ensemble classifier utilizes a modular collection of machine learning algorithms in conjunction with manually tagged data to identify objects with high accuracy and a meaningful confidence level. The use of a meaningful confidence level in the output identification allows for objects with low confidence to be flagged automatically for human review as part of a larger classification software package (illustrated in Figure 2). This can then be used reflexively to improve the quality of the training data used in the component classifiers [1]. At present, the system supports the use of different combinations of k-nearest neighbor (KNN), weighted KNN, specialized weighted binary KNN, support vector machine (SVM), Gaussian naïve Bayesian (GNB), GNB packs, decision tree, and random forest (RF) classifiers. Note that GNB packs and random forests are themselves ensemble classifiers, respectively ensembles of Bayesian and decision tree classifiers. Each of these classification systems is individually more effective at some identification tasks than others [2, 3, 4]. The classifier system relies on the use of a meaningful space of extracted object features, such as size, brightness, moment, et cetera, which can be corrected or adjusted on the basis of imaging conditions (e.g., magnification, exposure). This extraction process is carried out for both training data and for objects under classification. The modified feature vectors associated with known identified objects are used to train component classifiers. The classifiers may be trained on a subset of data or a subset of features [5]. The ensemble classifier makes use of a high-level decision tree in order to choose a classifier to apply to the object. The results of each classification (e.g., distinguishing optical artifacts from genuine objects) then inform what type of classification should be applied next. In the simplest case, illustrated in Figure 2, all classifiers are applied regardless of intermediate classifications, and decision architecture is implemented in the aggregation process. Each component classifier has been modified in order to produce a meaningful vector of confidences associated with each possible identification instead of simply producing a categorical calculation. At the end of the process, each object under classification has been associated with a sequence of confidence vectors, which can then be aggregated in several different ways to produce a final identification with an associated confidence. The simplest aggregation process supported is taking an arithmetic or geometric mean of the classification vectors. Various ordinal voting methods and ordinal statistical analyses can also be used to produce a meaningful aggregate confidence value in the final identification by highlighting when an object is similar to two different categories [6, 7]. https://doi.org/10.1017/S1431927622011345 Published online by Cambridge University Press