Classification of Clinical Gene-Sample-Time Microarray Expression Data via Tensor Decomposition Methods Yifeng Li and Alioune Ngom School of Computer Science, University of Windsor, Windsor, Ontario, CanadaN9B 3P4 {li11112c,angom}@uwindsor.ca http://cs.uwindsor.ca/uwinbio Abstract. With the recent advances in microarray technology, the expression levels of genes with respect to samples can be monitored over a series of time points. Such three-dimensional microarray data, termed gene-sample-time (GST) microarray data, are gene expression matrices measured as a time-series. They have not yet received considerable attention, and analysis methods need to be devised specifically to tackle the complexity of GST datasets. We propose meth- ods that are based on tensor decomposition for the sample classification. We use tensor decomposition in order to extract discriminative features as well as mul- tilinearly reducing high dimensionality. We then classify the test samples in the reduced spaces. We have tested and compared our approaches on a real GST dataset. We show that our methods are at least comparable in prediction accuracy to recent methods devised for GST data. Most importantly, our methods run much faster than current approaches. Keywords: Gene-Sample-Time Data, Tensor Decomposition, HOSVD, HOOI, HONMF. 1 Introduction DNA microarray technology can monitor thousands of genes in parallel, dramatically accelerating molecular biology experiments and providing a huge amount of data to find co-regulated genes, functions of genes, genetic networks, and marker genes, for instance. There are two types of microarray data: gene-sample data sets, which compile the expression levels of various genes over a set of biological samples; and gene-time data sets, which record the expression levels of various genes over a series of time- points. Both types of data are represented by a two-dimensional (2D) gene expression matrix, where genes correspond to rows in the matrix and each matrix entry contains the expression level of a given gene for some sample or at certain time-point. The gene- sample data are static data, while the gene-time data are dynamic data. The gene-sample data are typically analyzed in clinical research, while the gene-time data are usually obtained to investigate the gene regulations. Since genes regulations and expressions are temporally different, and a snap-shot is insufficient to capture the activities of genes, which may lead some false discovery when using this sort of static data. Corresponding author. R. Rizzo and P.J.G. Lisboa (Eds.): CIBB 2010, LNBI 6685, pp. 275–286, 2011. c Springer-Verlag Berlin Heidelberg 2011