New Distance Measure for Microarray Gene Expressions using Linear Dynamic
Range of Photo Multiplier Tube
Shubhra Sankar Ray
Center for
Soft Computing Research
Indian Statistical Institute
Kolkata, India
shubhra r@isical.ac.in
Sanghamitra Bandyopadhyay
Machine Intelligence Unit
Indian Statistical Institute
Kolkata, India
sanghami@isical.ac.in
Sankar K. Pal
Center for
Soft Computing Research
Indian Statistical Institute
Kolkata, India
sankar@isical.ac.in
Abstract
This paper deals with a new distance measure for genes
using their microarray expressions. The distance measure is
called, “Maxrange distance”, where an experiment specific
normalization factor is incorporated in the computation of
the distance. The normalization factor is dependent on the
linear dynamic range of the photo multiplier tube (PMT)
for scanning fluorescence intensities of the gene expression
values. Superiority of this distance measure in the microar-
ray gene ordering problem has been extensively established
on widely studied microarray data sets by performing sta-
tistical tests.
1 Introduction
The recent advances in DNA array technologies have re-
sulted in a significant increase in the amount of genomic
data [3, 2]. The most powerful and commonly used tech-
nique is that involving microarray, which has enabled the
monitoring of the expression levels of more than thousands
of genes simultaneously. Due to the large quantity of in-
formation available from microarray it is necessary to find
an appropriate distance measure for genes and to employ a
process of classification of the data in order to obtain initial
conclusions about the genes.
The present article deals with the tasks of measuring the
distance between genes and evaluating their biological or-
dering in clustering framework. The widely used measures
for finding the global similarity (where all the gene expres-
sion values present in the gene are taken into considera-
tion) between genes are the Pearson correlation [3, 2] and
the Euclidean distance [8]. In computing the similarity, all
the above mentioned measures do not assign appropriate
weights to gene expressions obtained from different types
of experiments, where the expressions differ by orders of
magnitude from one type to another. Consequently, gene
expression values in lower dynamic range do get dominated
by those with higher dynamic range. A new similarity mea-
sure between genes, called “Maxrange distance” is defined
in this article, where gene expression (for a particular type
of experiment) distance between two genes are first normal-
ized with a factor dependent on the linear dynamic range of
photo multiplier tube (used for scanning fluorescence inten-
sities of that experiment), and then summed to find a global
distance.
Superiority of the proposed Maxrange distance measure
over the related measures is established by using them on
four different algorithms.
2 Gene Ordering Methods
Cluster analysis, ordering, and display of gene expres-
sion patterns are considered to be useful tools to detect
genes that are co-expressed or implicated in similar cellular
functions [3, 2]. Hierarchical clustering approaches (single,
complete and average linkage) [3, 1] group gene expres-
sions into trees of clusters. They start with singleton sets
and merge all genes until all nodes belong to only one set.
Hierarchical clustering does not determine unique clusters.
Thus the user has to determine which of the subtrees are
clusters and which subtrees are only a part of a bigger clus-
ter. So in the framework of hierarchical clustering a gene
ordering algorithm helps the user to identify clusters, and
subclusters in big clusters, by means of visual inspection
of the clustered gene expression data [1]. Moreover, genes
that are adjacent in a linear ordering are often functionally
co-regulated and involved in the same cellular process [2, 3]
and biological analysis is often done in the context of this
linear ordering [1].
Ideally, one would like to obtain a linear order of all
Proceedings of the International Conference on Computing: Theory and Applications (ICCTA'07)
0-7695-2770-1/07 $20.00 © 2007