3036 Microsc. Microanal. 28 (Suppl 1), 2022
doi:10.1017/S1431927622011345 © Microscopy Society of America 2022
Development of a Flexible Ensemble Classification System for Microscopy
Tomas J. McIntee
1
, Mathieu Therezien
1
and Zachary E. Russell
1*
1.
Ion Innovations, Boone, NC, United States.
* Corresponding author: Zach.Russell@ion-innovations.com
In the present work, a highly-flexible ensemble classification system is proposed for broad use in
automatic identification of objects in microscopy applications. While a number of plug-and-play
solutions for machine learning and object classification exist, this platform is set apart due to being
specifically tuned for microscopy applications. This ensemble classifier utilizes a modular collection of
machine learning algorithms in conjunction with manually tagged data to identify objects with high
accuracy and a meaningful confidence level. The use of a meaningful confidence level in the output
identification allows for objects with low confidence to be flagged automatically for human review as
part of a larger classification software package (illustrated in Figure 2). This can then be used reflexively
to improve the quality of the training data used in the component classifiers [1].
At present, the system supports the use of different combinations of k-nearest neighbor (KNN), weighted
KNN, specialized weighted binary KNN, support vector machine (SVM), Gaussian naïve Bayesian
(GNB), GNB packs, decision tree, and random forest (RF) classifiers. Note that GNB packs and random
forests are themselves ensemble classifiers, respectively ensembles of Bayesian and decision tree
classifiers. Each of these classification systems is individually more effective at some identification
tasks than others [2, 3, 4].
The classifier system relies on the use of a meaningful space of extracted object features, such as size,
brightness, moment, et cetera, which can be corrected or adjusted on the basis of imaging conditions
(e.g., magnification, exposure). This extraction process is carried out for both training data and for
objects under classification. The modified feature vectors associated with known identified objects are
used to train component classifiers. The classifiers may be trained on a subset of data or a subset of
features [5].
The ensemble classifier makes use of a high-level decision tree in order to choose a classifier to apply to
the object. The results of each classification (e.g., distinguishing optical artifacts from genuine objects)
then inform what type of classification should be applied next. In the simplest case, illustrated in Figure
2, all classifiers are applied regardless of intermediate classifications, and decision architecture is
implemented in the aggregation process.
Each component classifier has been modified in order to produce a meaningful vector of confidences
associated with each possible identification instead of simply producing a categorical calculation. At the
end of the process, each object under classification has been associated with a sequence of confidence
vectors, which can then be aggregated in several different ways to produce a final identification with an
associated confidence. The simplest aggregation process supported is taking an arithmetic or geometric
mean of the classification vectors. Various ordinal voting methods and ordinal statistical analyses can
also be used to produce a meaningful aggregate confidence value in the final identification by
highlighting when an object is similar to two different categories [6, 7].
https://doi.org/10.1017/S1431927622011345 Published online by Cambridge University Press