A Least Squares Fitting-Based Modeling of Gene Regulatory Sub-networks Ranajit Das 1 , Sushmita Mitra 1 , C.A. Murthy 1 , and Subhasis Mukhopadhyay 2 1 Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700 108, India {ranajit r,sushmita,murthy}@isical.ac.in 2 Department of Bio-Physics, Molecular Biology and Bioinformatics, Calcutta University, Kolkata 700 009, India sm.bmbg@gmail.com Abstract. This paper presents a simple and novel least squares ﬁtting- based modeling approach for the extraction simple gene regulatory sub- networks from biclusters in microarray time series gene expression data. Preprocessing helps in retaining the strongly interacting gene regulatory pairs. The methodology was applied to public-domain data sets of Yeast and the experimental results were biologically validated based on stan- dard databases and information from literature. Keywords: Biclustering, transcriptional regulatory network, least squares, gene interaction network. 1 Introduction During the recent years, rapid development in DNA microarray technology have resulted in the parallel generation of expression data of thousand of genes, of various organisms, under several experimental conditions. Genome expression proﬁling of many organisms have been completed in the past few years. The latest Aﬀymetrix gene chips accommodate 750,000 unique 25-mer oligonucleotide features constituting more than 28,000 mouse gene-level probe sets. It is known that mRNA proﬁles are prone to diﬀerent kinds of noise and ambiguity, and may be unequally sampled over time. Time series gene expression data is also essentially under-determined, involving high-dimensional genes with very few time-points. Clustering is one way of estimating such noisy expression data, by grouping co-expressed genes with the assumption that they are co-regulated. However, it is observed that a subset of genes is co-regulated and co-expressed over a subset of experimental conditions only. Biclustering (or co-clustering) aims at bringing out such local structure inherent in the gene expression data matrix. It refers to some sort of feature selection and clustering in the space of reduced dimension, at the same time [1]. To quantify the similarity among the co-expressed genes in a bicluster several distance measures have been employed. However, it is to be noted that any ap- parent similarity of expression proﬁles between a pair of genes need not always signify direct regulation. It may denote an indirect coregulation by other genes, S. Chaudhury et al. (Eds.): PReMI 2009, LNCS 5909, pp. 165–170, 2009. c  Springer-Verlag Berlin Heidelberg 2009