A MULTIRESOLUTION ENHANCEMENT TO GENERIC CLASSIFIERS
OF SUBCELLULAR PROTEIN LOCATION IMAGES
Thomas Merryman
1
, Keridon Williams
3
, Gowri Srinivasa
2
, Amina Chebira
2
and Jelena Kova ˇ cevi´ c
2,1
1
Electrical and Computer Engineering and
2
Biomedical Engineering
Carnegie Mellon University, Pittsburgh, PA
3
Dept. of Science and Mathematics,
University of Virgin Islands, St Thomas, VI
ABSTRACT
We propose an algorithm for the classification of fluorescence mi-
croscopy images depicting the spatial distribution of proteins within
the cell. The problem is at the forefront of the current trend in bi-
ology towards understanding the role and function of all proteins.
The importance of protein subcellular location was pointed out by
Murphy, whose group produced the first automated system for clas-
sification of images depicting these locations, based on diverse fea-
ture sets and combinations of classifiers. With the addition of the
simplest multiresolution features, the same group obtained the high-
est reported accuracy of 91.5% for the denoised 2D HeLa data set.
Here, we aim to improve upon that system by adding the true power
of multiresolution—adaptivity. In the process, we build a system
able to work with any feature sets and any classifiers, which we de-
note as a Generic Classification System (GCS). Our system consists
of multiresolution (MR) decomposition in the front, followed by fea-
ture computation and classification in each subband, yielding local
decisions. This is followed by the crucial step of combining all those
local decisions into a global one, while at the same time ensuring
that the resulting system does no worse than a no-decomposition
one. On a nondenoised data set and a much smaller number of fea-
tures (a combination of texture and Zernicke moment features) and
a neural network classifier, we obtain a high accuracy of 89.8%, ef-
fectively proving that the space-frequency localized information in
the subbands adds to the discriminative power of the system.
1. LOCATION PROTEOMICS AND
MULTIRESOLUTION ANALYSIS
Among the most important goals in biological sciences today is to
understand the role and function of all proteins. One of the criti-
cal aspects of a protein’s activity and function is its subcellular lo-
cation (PSL), that is, the spatial distribution of proteins within the
cell. Today’s method of choice to determine PSL is fluorescence mi-
croscopy, its success due in part to the advent of a range of new flu-
orescent probes used to tag proteins or molecules of interest, includ-
ing the nontoxic, green fluorescent protein (GFP). Once successfully
tagged, the cells are imaged using one of many models of fluores-
cence microscopes to produce a multidimensional data set—a bioim-
age. These bioimages can be just 2D slices or 3D cell/tissue vol-
umes (z-stacks). Acquiring bioimages at multiple time instants re-
sults in 3D movies (2D time series) or 4D data sets (z-stacks tracked
in time). Finally, these microscopes allow for imaging of multiple
fluorescence channels, bringing the possible dimensionality of the
This work was supported in part by the PA State Tobacco Settlement,
Kamlet-Smith Bioinformatics Grant.
MR
Decomposition
Feature
Computation
Classification Voting
Input
Image
Class
Label
Fig. 1. MR enhancement to a generic classification system.
data set to 5D. With the enormous volume of such high-dimensional
data sets being generated, human analysis becomes time-consuming,
prone to error and ultimately, impractical, leading to the “holy grail”
for PSL bioimage interpretation and analysis: develop a system for
fast, automatic, and accurate recognition of proteins based on their
subcellular location images.
Murphy et al. pioneered automated PSL interpretation and anal-
ysis, resulting in a system that can classify protein location patterns
with well-characterized reliability and better sensitivity than human
observers [1, 2]. This work was followed by [3, 4]. With the addition
of the simplest multiresolution features, in [1], the authors obtained
the highest reported accuracy of 91.5% for the denoised 2D HeLa
data set.
Problem Statement. The problem we are addressing is that
of classifying the spatial distribution patterns of selected proteins
within the cell. The challenge in this data set is that images from
the same class look different while those from different classes look
very similar (see Fig. 2). In the data sets we use, the proteins were
labeled using immunofluorescence. So we know the ground truth,
that is, which proteins were labeled and subsequently imaged. This
is useful for algorithm development as we can test the accuracy of
our classification scheme.
Methodology. As the introduction of the simplest multiresolu-
tion features produced a statistically significant jump in classificia-
tion accuracy, our aim is to explore more sophisticated multriresolu-
tion techniques. In particular, the following are the three character-
istics of multiresolution we wish to explore:
(a) Localization: Fluorescence microscopy images have highly
localized features both in space and frequency. This leads us to MR
tools, as they have been found to be the most appropriate tools for
computing and isolating such localized features [5].
(b) Adaptivity: Given that we are designing a system to distin-
guish between classes of proteins, it is clear that an ideal solution is
adaptive, a property provided by MR techniques.
(c) Fast and Efficient Computation: It is well known that wavelets
have a computational cost of the order O(n), where n is the input
size, as opposed to O(n log n) typical for other linear transforms
including the FFT.
Our Philosophy. Based on the above arguments, we would like
to extract discriminative features within space-frequency localized
subspaces. These are obtained by MR decomposition; thus, instead
570 0-7803-9577-8/06/$20.00 ©2006 IEEE ISBI 2006