Network Expansion and Pathway Enrichment Analysis towards Biologically Significant Findings from Microarrays Xiaogang Wu 1,2,* Christoph Reinhard 3 , Shuyu D. Li 3, , Hui Huang 1,* , Tao Wei 3 , Ragini Pandey 2 , † 1 School of Informatics, Indiana University, Indianapolis, IN 46202, USA , Jake Y. Chen 1,2,† 2 MedeoLinx, LLC, Indianapolis, IN 46280, USA 3 Eli Lilly and Company, Indianapolis, IN 46285, USA Summary In many cases, crucial genes show relatively slight changes between groups of samples (e.g. normal vs. disease), and many genes selected from microarray differential analysis by measuring the expression level statistically are also poorly annotated and lack of biological significance. In this paper, we present an innovative approach - network expansion and pathway enrichment analysis (NEPEA) for integrative microarray analysis. We assume that organized knowledge will help microarray data analysis in significant ways, and the organized knowledge could be represented as molecular interaction networks or biological pathways. Based on this hypothesis, we develop the NEPEA framework based on network expansion from the human annotated and predicted protein interaction (HAPPI) database, and pathway enrichment from the human pathway database (HPD). We use a recently-published microarray dataset (GSE24215) related to insulin resistance and type 2 diabetes (T2D) as case study, since this study provided a thorough experimental validation for both genes and pathways identified computationally from classical microarray analysis and pathway analysis. We perform our NEPEA analysis for this dataset based on the results from the classical microarray analysis to identify biologically significant genes and pathways. Our findings are not only consistent with the original findings mostly, but also obtained more supports from other literatures. 1 Background Microarrays make possible the discovery of new functions and pathways of known genes, as they measure all the transcriptional activity in a biological sample [1]. This high-throughput procedure can be used in medical diagnostics, in biomarker discovery, and in investigating the ways a drug, disease, polymorphism or environmental condition affects gene expression and function [2, 3]. However, one challenge has arisen because microarray technology generates a large amount of transcriptional data, which is hard to interpret for the results to gain insights into biological mechanisms [4]. As a result, researchers have sought to analyze microarray data through the use of modern computational tools and statistical methods. In many cases, crucial genes show relatively slight changes, and many genes selected from differential analysis between groups of samples (e.g. normal vs. disease) by measuring the expression level statistically are also poorly annotated [2]. From a biological perspective, functionally related genes often display a coordinated expression to accomplish their roles in * These authors contributed equally to this work. † To whom correspondence should be addressed. Email: jakechen@iupui.edu, li_shuyu_dan@lilly.com Journal of Integrative Bioinformatics, 9(2):213, 2012 http://journal.imbio.de doi:10.2390/biecoll-jib-2012-213 1 Copyright 2012 The Author(s). Published by Journal of Integrative Bioinformatics. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License (http://creativecommons.org/licenses/by-nc-nd/3.0/).