Mass Spectrometry Analysis Via Metaheuristic
Optimization Algorithms
Syarifah Adilah M.Y., Ibrahim Venkat, Rosni Abdullah, Umi Kalsom Yusof
1,2,3,4
School of Computer Sciences
Universiti of Sciences Malaysia
11800 Penang, Malaysia.
1
Dept. of Computer Sciences and Matematics
Universiti Teknologi MARA Pulau Pinang
13500 Penang, Malaysia.
1
syarifah.adilah@ppinang.uitm.edu.my,
2
ibrahim@cs.usm.my,
3
rosni@cs.usm.my and
4
umiyousof@cs.usm.my
Abstract—Biologically inspired metaheuristic techniques for
exatracting salient features from mass spectrometry data has
been recently gaining momentum among related fields of re-
search viz., bioinformatics and proteomics. Such sophisticated
approaches provide efficient ways to mine voluminous mass
spectrometry data in order to extract potential features by getting
rid of redundant information. This feature extraction process
ultimately aids in discovering disease-related protein patterns in
complex mixtures that is easily obtained from biological fluids
such as serum and urine. This article provides an overview of
such typical bio-inspired approaches.
Index Terms—metaheuristics; bioinformatics; feature selec-
tion; proteomics;
I. I NTRODUCTION
Analysis of biomarkers based on their diagnostic and
prognostic potentials has been growing as an active
area of bioinformatics oriented cancer research [1]. Well
known mass spectrometry techniques such as Matrix-
Assisted Laser Desorption/Ionization Time-Of-Flight
Mass Spectrometry (MALDI-TOF-MS) and Surface-
Enhanced Laser Desorption/Ionization Time-Of-Flight Mass
Spectrometry (SELDI-TOF-MS) generate high throughputs
of proteomics patterns, structures of proteins, from complex
mixtures such as serum, urine, nipple aspirate fluids and
so on. Clinical researchers use to identify new biomarkers
from these associated protein expression levels. The output
of this Mass Spectrometry (MS) analysis is a spectrum,
which can be represented as a xy-graph in terms of ratio
of mass to charge ratio (m/z ) versus ionization intensities.
Significant information of the spectrum comprises of peaks
of the intensities with proportional m/z values. However as
the MS data bears high dimensionality, it implicitly demands
the application of robust pattern recognition techniques that
can cope up with large amounts of redundant data.
Feature selection, a process of selecting a subset of
original features according to certain criteria, is an important
and frequently used dimensionality reduction technique for
data mining [2], [3]. It reduces the number of features,
removes irrelevant, redundant, or noisy data, and brings
the immediate effects for applications: thereby speeding up
data mining algorithms, and improving mining performance
such as predictive accuracy and comprehensibility of
results. In biological context, the technique is also called as
discriminative gene selection, which detects influential genes
based on DNA micro-array experiments. In MS analysis,
feature selection plays two vital roles; Firstly it aids to
construct a feature selection search which seek for significant
features to discriminate diseases from control samples;
Secondly it helps to construct an appropriate classification
model that enables the identification of potential biomarkers
for further analysis.
Feature selection algorithms typically fall into two
categories: feature ranking and subset selection. Feature
ranking ranks the features by a metric and eliminates all
features that do not achieve an adequate score. In contrast to
this, subset selection searches the set of possible features for
the optimal subset. That is, it evaluates a subset of features
as a group for suitability. Further, subset selection algorithms
can be classified into three categories viz.: Wrappers, Filters
and Embedded [3]. Wrappers use a search algorithm to search
through the space of possible features and evaluate each
subset by running a model on the subset. Wrappers can be
computationally expensive and have a risk of over fitting to the
model. However, this drawback can be reduced by injecting
some heuristic techniques in the search process to achieve an
optimal subset. Filters are similar to Wrappers in the search
approach, but instead of evaluating against a model, a simpler
filter is evaluated. Filter-based feature ranking techniques
rank features independently without the involvement of any
learning algorithms. Feature ranking consists of scoring
each feature according to a particular method, then selecting
features based on their scores. Filter methods are the most
commonly applied techniques in bioinformatics studies
since they have proven to be computationally simple, fast
and independent of other analysis algorithms. Also they
allow features to be quantified and prioritized according to
the scores, which is particularly important for biological
978-0-7695-4514-1/11 $26.00 © 2011 IEEE
DOI 10.1109/BIC-TA.2011.7
76
2011 Sixth International Conference on Bio-Inspired Computing: Theories and Applications
978-0-7695-4514-1/11 $26.00 © 2011 IEEE
DOI 10.1109/BIC-TA.2011.7
75