Biomedical Image Classification with Random Subwindows and Decision Trees Rapha¨ el Mar´ ee, Pierre Geurts, Justus Piater, and Louis Wehenkel GIGA/CBIG, Montefiore Institute, University of Li` ege, Belgium Abstract In this paper, we address a problem of biomedical image classification that involves the automatic classification of x-ray images in 57 predefined classes with large intra-class variability. To achieve that goal, we apply and slightly adapt a recent generic method for image classification based on ensemble of decision trees and random subwindows [MGPW05]. We obtain classification results close to the state of the art on a publicly available database of 10000 x-ray images. We also provide some clues to interpret the classification of each image in terms of subwindow relevance. Image Classification ⊲ Goal: Given a set of training images labelled into a finite number of classes, the goal of an automatic image classification method is to build a model that will be able to predict accurately the class of new, unseen images. ⊲ Biomedical applications: Organize large-scale image databases into categories without limitation to a specific diagnostic study, setup clinical diag- nosis tools, provide high-throughput cell phenotype screen- ing, . . . ⊲ Some solutions: • Pre-processing “feature extraction” step, specific to the particular problem and application domain • Features used as new input variables for traditional learning algorithms (nearest neighbor or neural network classifiers) Random Subwindows and Decision Trees ⊲ Concepts • Extraction of a large number of possibly overlapping, square subwindows of random sizes and at random posi- tions • Pixel-based description with scale normalization • Tree-based machine learning ensemble methods • Successfully applied to household objects, buildings, land- scape themes, handwritten digits, faces, . . . ⊲ Learning stage C1 C2 C3 C1 C1 C1 C1 C1 C2 C2 C2 C2 C2 C3 C3 C3 C3 C3 • Class-balanced extraction of (N w ) subwindows: from each training image of class c, we extract N w /(m * nb c ) subwin- dows where m is the number of classes and nb c the number of training images of class c • Subwindow resizing down to a fixed size (16 × 16 pixels) • Subwindow labeling as its parent image class T2 T1 T3 T4 T5 • Building a subwindow classification model using supervised methods • Ensemble of T decision tree methods: Tree Boosting, Extra-Trees [GEW06] Random Subwindows and Decision Trees (continued) ⊲ Prediction stage T2 T1 T3 T4 T5 C1 C2 CM 0 0 0 0 0 0 0 1 4 0 C1 C2 CM 0 0 0 0 0 0 0 0 1 4 C1 C2 CM ? ? ? ? ? ? ? + = 3 5 10 4 6 1 5 2 49 1 C2 • Extraction of N w,test subwindows in test image • Propagation of each subwindow into each tree • Aggregation of tree votes. We assign to the image the ma- jority class among the classes assigned to its subwindows. Dataset: IRMA challenge ⊲ Description • 10000 x-ray images (courtesy of TM Lehmann, RWTH, Aachen, Germany, http://www.irma-project.org) • 57 classes according to the IRMA code: different modali- ties, orientations, body parts, and biological systems ⊲ Examples Images from the “coronal, pelvis, musculosceletal” class Images from 7 cranium/cervical spine classes Protocol and Results ⊲ Protocol and parameters • Training set: 9000 images, test set: 1000 images ([iCS05]) • N w = 800000, T = 25, N w,test = 500 ⊲ Misclassification error rate Method error rate 1-NN + IDM [KGN04] 12.6% 1-NN + CCF + IDM + Tamura 13.3% Discriminative patches [DKN05] 13.9% Random Subwindows + Tree Boosting 14.0 % MI1 Confidence 14.6% Random Subwindows + Extra-Trees 14.7% Gift 5NN8g 20.6% ... ... Nearest Neighbor, 32 × 32, Euclidian 36.8% ... ... Texture directionality 73.3% ⊲ Computational Efficiency • Training algorithm is on the order of TN w log N w • Prediction essentially proportional to TN w,test log N w Interpretability • Well classified subwindows could bring potentially useful information about that class Conclusion • We applied our generic method [MGPW05] on a specific biomedical task • We obtained results competitive with state-of-the-art algo- rithms without tedious adaptation. It confirms the poten- tial of the approach for a wide range of applications. • The possibility to extract interpretable information from images has been highlighted References [DKN05] T. Deselaers, D. Keysers, and H. Ney. Discriminative training for object recognition using image patches. In Proc. International Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 157–162, June 2005. [GEW06] P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Accepted for publication in Machine Learning Journal, 2006. [iCS05] S. L. N. in Computer Science, editor. Proc. of Cross Language Evaluation Forum (CLEF), to appear, 2005. [KGN04] D. Keysers, C. Gollan, and H. Ney. Classification of medical images using non-linear distortion models. In Bildverarbeitung f¨ ur die Medizin (BVM), pages 366–370, March 2004. [MGPW05] R. Mar´ ee, P. Geurts, J. Piater, and L. Wehenkel. Random subwindows for robust image classification. In C. Schmid, S. Soatto, and C. Tomasi, editors, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2005), volume 1, pages 34–40. IEEE, June 2005.