Improved biclustering on expression data through overlapping control Beatriz Pontes Department of Computer Science, University of Seville, Seville, Spain, and Federico Divina, Rau ´l Gira ´ldez and Jesu ´s S. Aguilar-Ruiz School of Engineering, Pablo de Olavide University, Seville, Spain Abstract Purpose – The purpose of this paper is to present a novel control mechanism for avoiding overlapping among biclusters in expression data. Design/methodology/approach – Biclustering is a technique used in analysis of microarray data. One of the most popular biclustering algorithms is introduced by Cheng and Church (2000) (Ch&Ch). Even if this heuristic is successful at finding interesting biclusters, it presents several drawbacks. The main shortcoming is that it introduces random values in the expression matrix to control the overlapping. The overlapping control method presented in this paper is based on a matrix of weights, that is used to estimate the overlapping of a bicluster with already found ones. In this way, the algorithm is always working on real data and so the biclusters it discovers contain only original data. Findings – The paper shows that the original algorithm wrongly estimates the quality of the biclusters after some iterations, due to random values that it introduces. The empirical results show that the proposed approach is effective in order to improve the heuristic. It is also important to highlight that many interesting biclusters found by using our approach would have not been obtained using the original algorithm. Originality/value – The original algorithm proposed by Ch&Ch is one of the most successful algorithms for discovering biclusters in microarray data. However, it presents some limitations, the most relevant being the substitution phase adopted in order to avoid overlapping among biclusters. The modified version of the algorithm proposed in this paper improves the original one, as proven in the experimentation. Keywords Programming and algorithm theory, Data structures, Genes Paper type Technical paper 1. Introduction By measuring the expression level of a large number of genes (from the same organisms or from different ones), under different experimental conditions (different environments, individuals, time series, different cells, etc.), it is possible to analyze the behavior of the genes. The expression level of a gene is the measurement of the activity of the gene. Generally, the expression level of a gene measures the relative amount of mRNA expressed under an experimental condition. This analysis allows discovering or justifying certain biological phenomena (Harpaz and Haralick, 2006). The current issue and full text archive of this journal is available at www.emeraldinsight.com/1756-378X.htm This research is supported by the Spanish Ministry of Science and Technology under grant TIN2007-68084-C02-00 and the Junta de Andalucı ´a Research Program. Improved biclustering on expression data 477 Received 7 November 2008 Revised 10 December 2008 Accepted 15 December 2008 International Journal of Intelligent Computing and Cybernetics Vol. 2 No. 3, 2009 pp. 477-493 q Emerald Group Publishing Limited 1756-378X DOI 10.1108/17563780910982707