Binary Plankton Image Classification Using Random Subspace Feng Zhao, Xiaoou Tang, and Feng Lin Department of Information Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong {fzhao0, xtang, flin0}@ie.cuhk.edu.hk Scott Samson and Andrew Remsen College of Marine Science University of South Florida Saint Petersburg, Florida, U. S. A. {samson, aremsen}@marine.usf.edu Abstract— In this paper, we implement a random subspace based algorithm to classify the plankton images detected in real time by the Shadowed Image Particle Profiling and Evaluation Recorder. The difficulty of such classification is compounded because the data sets are not only much noisier but the plankton are deformable, projection-variant, and often in partial occlusion. In addition, the images in our experiments are binary thus are lack of texture information. Using random sampling, we construct a set of stable classifiers to take full advantage of nearly all the discriminative information in the feature space of plankton images. The combination of multiple stable classifiers is better than a single classifier. We achieve over 93% classification accuracy on a collection of more than 3000 images, making it comparable with what a trained biologist can achieve by using conventional manual techniques. I. INTRODUCTION Plankton including phytoplankton and zooplankton form the base of the food chain in the ocean and are a fundamental component of marine ecosystem dynamics. The rapid mapping of plankton abundance together with taxonomic and size composition can help the oceanographic researchers understand how climate change and human activities affect marine ecosystems. Earlier researchers investigated the temporal and spatial variability in plankton abundance and composition by manually counting the samples collected using traditional methods (e.g., towed nets, pumps, and Niskin bottles), which is laborious and time consuming. To improve sampling efficiency, some new instruments such as the Video Plankton Recorder (VPR) [1], the HOLOMAR underwater holographic camera system [2], and the Shadowed Image Particle Profiling and Evaluation Recorder (SIPPER) [3] have been developed to continuously sample magnified plankton images in the ocean. The experimental data sets in this work come from the SIPPER system recently developed by University of South Florida. The SIPPER images differ from those used for most previous research in four aspects: 1) the images are much noisier, 2) the objects are deformable and often partially occluded, 3) the images are projection variant, i.e., the images are video records of 3D objects in arbitrary positions and orientations, and 4) the images in our experiments are binary thus are lack of texture information. Fig. 1 shows some typical examples to illustrate the diversity of the SIPPER images. To deal with these difficulties, we combine the general features [4] (e.g., moment invariants [5], Fourier descriptors [6], and granulometric features [7]) with some specific features [8] (e.g., circular projections, boundary smoothness, and object density) to form a more complete description of the binary plankton patterns. To remove redundancy and reduce noise, we use the Principle Component Analysis (PCA) to compact the combined feature vectors, with the eigenvectors corresponding to small eigenvalues removed in the PCA subspace [4][8]. Since these eigenvectors may encode some useful information for recognition, their removal may introduce a loss of discriminative information. To solve this problem, we propose an approach using random subspace [9]. The approach has been shown to be very effective for face recognition [11]. In the random subspace method, a number of low-dimensional subspaces are generated by randomly sampling from the original high-dimensional feature space. Finally, multiple classifiers constructed in the random subspaces are combined to make a powerful decision [10]. Using random sampling, the constructed classifiers are stable and multiple classifiers cover nearly the entire feature space without losing much discriminative information. Thus, good performance can be achieved. The experiments on seven classes of more than 3000 binary plankton images clearly demonstrate the efficiency and superiority of our algorithm. II. FEATURE EXTRACTION In order to form a more complete description of the binary plankton patterns, we combine the general features such as moment invariants, Fourier descriptors (FD and filled FD), and granulometries with some specific features such as circular projections (CMS, P 1 , P 2 , filled P 1 , and filled P 2 ), boundary smoothness, and object density. In this work, we add three types of structure elements (square, disk, and rhombus) of increasing sizes to compute the granulometric features since granulometries are relatively robust to noise, occlusion, and projection directions. All the extracted features are translation, scale, and rotation invariants. A brief description of them is shown in Table I. Refer to [4]-[8] for details. 0-7803-9134-9/05/$20.00 ©2005 IEEE