Abstract—Oral cancer is characterized by multiple genetic events such as alterations of a number of oncogenes and tumour suppressor genes. The aim of this study is to identify genes and their functional interactions that may play a crucial role on a specific disease-state, especially during oral cancer progression. We examine gene interaction networks on blood genomic data, obtained from twenty three oral cancer patients at four different time stages. We generate the gene-gene networks from sparse experimental temporal data using two methods, Partial Correlations and Kernel Density Estimation, in order to capture genetic interactions. The network study reveals an altered MET (hepatocyte growth factor receptor) network during oral cancer progression, which is further analyzed in relation to other studies. I. INTRODUCTION Biological processes organizing functional associations between different genes are central in understanding the biological mechanisms of several diseases, including oral cancer [1]. A variety of high-throughput experimental data, such as DNA microarray, ChIP-chip technology allow the simultaneous measurements of expression levels. These technologies have given thorough insight in complex molecular events in healthy and disease states. The extended study of related datasets has provided a new perspective in gene-gene network association studies with the network construction from experimental data being a promising approach in modeling functional processes. Several computational methodologies have been applied to construct biological networks using different data sources [2]. The main focus of networking approaches is to build target- independent networks that describe the pair-wise relations between molecules. Recent studies include Bayesian networks [3], Pearson’s correlation-based approaches [4]. Although these methods have been successfully used to elucidate the functional relationship between genes and pathways, they are unlikely to directly indicate the specific gene networks in response to abnormal physiological conditions such as diseases, due to experimental errors and the inherent genetic complexity [2-4]. The analysis reported herein is an effort of revealing and modeling the inter-relationships of molecules in oral cancer that participate in many different pathways incriminated for this disease. The proposed method (in section II) for network construction is based on Kernel density estimation denoted as *Research supported by “YPERThEN” project, which is funded by the EU and funds from Greece and Cyprus, and by “OASYS” project funded by the NSRF 2007-13 of the Greek Ministry of Development. K. Kalantzaki, E. S. Bei, M. Garofalakis and M. Zervakis are with the Department of Electronic and Computer Engineering, TUC, Chania 73100, Greece (kkalantzaki@isc.tuc.gr, abei@isc.tuc.gr, michalis@display.tuc.gr minos@acm.org). K. Exarchos and D. Fotiadis are with the Department of Materials Science and Engineering, University of Ioannina, Ioannina, 45110, Greece (kexarcho@gmail.com, fotiadis@cc.uoi.gr). KDE, as an attempt to model the nonlinear effect of gene interactions and to fill the information loss from the data samples. Our framework is applied on experimental blood data of oral cancer patients received from four successive follow-ups in section III. The goal is to reveal the network structure and differences between different time slices, in addition to conspicuous genes that play central role in all stages of the disease. II. METHODOLOGY A. Partial Correlation Pair-wise associations of co-expressed molecules can be modeled by Pearson’s correlation. The interaction identification between two variables is reduced to estimating the covariance matrix S. Each element in , via and , represents the correlation coefficient between nodes X i and X k and indicates an association. The method of partial correlations (PC) [4] measures the correlation between two variables after the common effects of all other variables are removed. An appropriate notion of the strength for these interactions is the partial correlation matrix . Its coefficients describe the correlation between genes i and k conditioned on all remaining genes of the network. This property is reflected in the inverse covariance matrix S, S -1 , with elements: i S i √ S ii S Given the experimental data, the covariance matrix is computed and then it is inverted. Indeed, using (1) the partial correlations, ik can be easily computed. Significantly small values of | ik | indicate conditional independence between i and k given the remaining variables in graph. On the contrary, high values of | ik | indicate dependence between i and k which contributes to adding an edge between these nodes. However, this approach is only applicable if the sample number in dataset is larger than the number of genes/proteins. Otherwise, the inversion of S is unstable making the estimation of S a non-trivial task. To overcome this obstacle we invert S through Moore-Penrose pseudo inverse [4], an approximation of the standard matrix inverse, based on the singular value decomposition (SVD). B. Kernel Estimation Density Kernel density estimation [5], is a non-parametric framework that estimates the probability density function (pdf) of a random variable. Assume that a generic network is developed based on a limited genomic i.i.d dataset X=(x 1 ,..x n ),where x i denotes the sample i of gene X. The KDE allows the estimation of X as follows: Identification of altered MET network in Oral Cancer Progression based on Nonparametric Network Design* K. Kalantzaki, E. S. Bei, K. P. Exarchos, M. Zervakis Member, IEEE and D. I. Fotiadis, Member, IEEE, M. Garofalakis Member, IEEE f h nh ∑ i h n i (2)