R.C. Jain et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 4( Version 5), April 2014, pp.57-62 www.ijera.com 57 | Page Survey on Mining Order-Preserving Sub Matrices Reeta Dangi, R.C. Jain, Vivek Sharma Department of Information Technology SATI Vidisha, India Director of SATI Vidisha, India Department of Information Technology SATI Vidisha, India Abstract: -Order-preserving sub matrices (OPSM's) have been shown useful in capturing concurrent patterns in data when the relative magnitudes of data items are more important than their exact values. For example, in analyzing gene expression profiles obtained from micro-array experiments, the relative magnitudes are important both since they represent the change of gene activities across the experiments, and since there is typically a high level of noise in data that makes the exact values un-trustable. To manage with data noise, repeated experiments are often conducted to collect multiple measurements. Keywords: -Order-preserving sub matrices, Simultaneous Clustering. I. INTRODUCTION In bioinformatics community, a large number of genes are studied by using DNA micro-array technology to obtain gene expression data. Gene expression data are usually organized as matrices, in which each row represents one gene and each column represents a sample for the experiment, and each item records the expression value of one gene under an experiment sample. Through the analysis of expression data, we can discover information about the genes. Clustering is helpful to find different functional categories of genes. Among various kinds of clustering approaches, Order-Preserving Sub Matrix has been a useful method to discover groups of genes that share some common functions. Simultaneous clustering, usually designated by bi- clustering, co-clustering, 2-way clustering or block clustering, is an important method in two-way data analysis. A number of algorithms that perform simultaneous clustering on rows and columns of a matrix have been proposed to date. The goal of simultaneous clustering is to find sub-matrices, which are subgroups of rows and subgroups of columns that exhibit a high correlation. This type of algorithms has been proposed and used in many fields, such as bio- informatics [1], web mining [2], text mining [3] and social network analysis [4]. II. OVERVIEW OF SIMULTANEOUS CLUSTERING PROBLEM Clustering is the grouping together of similar subjects. Standard clustering methods consider the value of each point in all dimensions, in order to form group of similar points. This kind of one-way clustering techniques is based on similarity between subjects across all variables. Simultaneous clustering algorithms seeks ―blocks‖ of rows and columns thatare interrelated. They aim to identify a set of bi-clusters Bk(Ik, Jk), where Ik is a subset of the rows X and Jk is a subset of the columns Y. Ik rows exhibit similar behavior across Jk columns, or vice versa and every bi-cluster Bk satisfies some criteria of homogeneity. A bi- clustering method may assume a specificstructure and data type. Madeira and Oliveira launch in their survey [5]some bi-clustering structures defined by: single bi- cluster, exclusive rows bi-clusters, exclusive columns bi-clusters, non overlapping bi-clusters with tree arrangement, and arbitrarily positioned overlapping bi-clusters. Bi-clusters can be with constantvalues, with constant values on rows or columns, with coherent values or withcoherent evolution. There are many advantages in a simultaneous rather thanone way clustering (table 1). In fact, simultaneous clustering may highlight the association between the row and column clustering that appears from the dataanalysis as a linked clustering. in addition, it allows the researcher to deal withsparse and high dimensional data matrices [6]. Simultaneous clustering is alsoan interesting paradigm for unsupervised data analysis as it is more useful, has less parameters, is scalable and is able to effectively interlink row and column information. Table 1. Comparison between Clustering and Simultaneous clustering Clustering Simultaneous Clustering Applied to each the rows or the columns of the data matrix separately Global model. performs clustering in the two dimensions simultaneously Local model. produce clusters of rows or clusters of columns. seeks blocks of rows and columns that are RESEARCH ARTICLE OPEN