Detecting Pathway Cross-talks by Analyzing Conserved Functional Modules across Multiple Phenotype-Expressing Organisms Kevin Wilson § , Andrea M. Rocha , Kanchana Padmanabhan ∗† , Kuangyu Wang ∗† , Zhengzhang Chen ∗† , Ye Jin ∗† , James R. Mihelcic and Nagiza F. Samatova ∗†¶ North Carolina State University, Raleigh, NC 27695 Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831 University of South Florida, Tampa, FL 33620 § RTI International, Durham, NC 27709 Corressponding author - Samatova@csc.ncsu.edu Abstract—Biological systems are organized hierarchically, starting from the protein level and expanding to pathway or even higher levels. Understanding interactions at lower levels (proteins interactions) in the hierarchy will help us understand interactions at higher levels (pathway cross-talks). Identifying cross-talks that are related to the expression of a particular- phenotype will be of interest to genetic engineers, because it will provide information on how different cellular subsystems could work together to express a phenotype. Current research has typically focused on identifying genotype-phenotype associ- ations or pathway-phenotype associations. In contrast, we de- veloped a method to identify phenotype-related pathway cross- talks by obtaining conserved groups of interacting proteins (functional modules). By applying our method to two groups of hydrogen producing organisms (light fermentation and dark fermentation), we have shown that our method effectively unearths known pathway cross-talks that are important to hydrogen production. Keywords-protein functional module; phenotype-expressing organism; pathway; cross-talk; I. I NTRODUCTION Proteins, such as enzymes, often work together to achieve a particular function. A metabolic pathway is a series of chemical reactions catalyzed by enzymes. Different metabolic pathways may cross-talk (interact) with each other for purposes, such as regulation or compensation. For example, in Anabaena (Nostoc) sp. PCC7120, cross-talks between nitrogen, iron, and central metabolism have been observed at regulatory level [1], [2]. Nitrogen metabolism cross-talks with iron uptake pathway in a way that the nitrogen regulator, NtcA, is able to alter the expression of the iron uptake protein, FurA [3]. Interaction between the two proteins is important for maintaining iron homeostasis, which is essential for nitrogen-fixation in organisms such as Anabaena. Moreover, in a study by Lopez-Gollomon et al. [2], NtcA and FurA were shown to co-regulate the expression of several genes involved in nitrogen metabolism and photosynthesis, which demonstrates how interrelated many metabolic pathways are within microorganisms. Un- derstanding of cross-talks in metabolic networks is partic- ularly important when engineering metabolic pathways for enhanced expression of a trait or desired end-product for industrial use (e.g., ethanol and hydrogen). II. APPROACH A. Overview To identify cross-talks that contribute to the expression of a specific phenotype, we include multiple phenotype- expressing organisms in our study based on the assumption that phenotype-related cross-talks are likely to be conserved across organisms with the same phenotype. The phenotype-related cross-talks can be identified by analyzing groups of interacting proteins (functional module) present across multiple phenotype-expressing organismal networks. There are several ways that a functional module is typically modeled, the most common being the clique and cluster models. Cliques are completely connected subgraphs and hence, using clique as a model might not allow us to capture some subtle cross-talking mechanisms that may exist. For example, a cross-talk mechanism where only few proteins from each pathway interact while the rest of the proteins have no interaction will not form a clique. Thus, in this paper we use a cluster to model the conserved functional module. The only restriction we place is that the conserved cluster of proteins must form a connected component. A disconnected set of proteins may not be interacting at all and likely do not cross-talk. Another factor to be considered is that all phenotype- expressing organisms may not use the same cross talking mechanisms and so it is important to capture signals that may only be present in a particular subset of the organisms. Thus our method enumerates all the conserved connected components present across all or a subset of the given set of phenotype-expressing organisms. These components are further analyzed for potential cross-talk mechanisms. How- ever, directly comparing the organismal networks to identify these conserved modules might not be tractable. Hence, we 2011 IEEE International Conference on Bioinformatics and Biomedicine 978-0-7695-4574-5/11 $26.00 © 2011 IEEE DOI 10.1109/BIBM.2011.35 443