Analyzing Genetic Factors Involved in Recombinant Protein Expression Enhancement Daniel Johnson*, Keat Teoh, Cody Ashby, Elizabeth Hood and Xiuzhen Huang* Arkansas State University, Jonesboro, Arkansas 72467 U.S.A. Corresponding emails: Daniel.johnson@smail.astate.edu; xhuang@astate.edu Abstract—Understanding the genetic factors that promote recombinant protein accumulation in transgenic plants will provide insightful strategies for protein biofactory efficiency. Through combining biological and bioinformatics analysis, our work is to determine genetic and biological factors affecting increased protein accumulation of a bacterial cellulase enzyme in transgenic maize. Microarray experiments were performed on maize near-isogenic lines that exhibit high and low accumulation of the enzyme expressed from a transgene in seeds. Through microarray data analysis, two thousand three hundred thirteen genes were identified which exhibited at least a 1.5-fold change in expression level. One hundred sixty-one genes from the data set are shown to be statistically valid. Of these, eighty-two genes are up- regulated while seventy-nine genes are down-regulated. Preliminary functional analysis of the genes was conducted and several pathways of biological importance were tentatively identified. These genes code for four categories of proteins: known proteins, zein proteins, putative proteins and unknown proteins. Further functional clustering and annotation analysis for these genes will help construct and define networks of interaction as well as predict important metabolic pathways to understand the controlling mechanisms that lead to the hyper-accumulation phenomenon. Keywords-gene expression; microarray data; recombinant protein accumulation I. BACKGROUND Plant Biotechnology was developed to engineer plants to express traits that improved their growth and productivity [11]. In the last twenty years, plants have been used to produce output traits including pharmaceuticals and industrial enzymes [4, 7 and 10]. Among those industrial enzymes are cellulases that will be useful for deconstructing cellulose for biobased products. Seed based expression is a useful system for stable, high-level accumulation of a target protein expressed from a transgene [16]. Among other proteins, both an endo-cellulase (E1) and an exo-cellulase (CBHI) have been expressed in seed [9]. One of the major roadblocks to the use of cellulases in the bio-fuels and bio-based products industries is the availability of an inexpensive large-scale source of the enzymes. In an effort to produce large amounts of cellulase for industrial applications, the maize seed expression system was tested [9]. For the results reported here, the Acidothermus cellulolyticus endo-/belta-1,4-glucanase gene (E1) was placed under the control of maize embryo-preferred promoter elements to induce high levels of recombinant protein in seed. Six generations of breeding in the back cross program were performed with the goal of developing production lines with good agronomic traits. An additional result from the breeding program was recovery of seed with a greater than 10- fold increase in cellulase protein above first generation seed. Although a number of proteins have shown this phenomenon of increased accumulation through breeding and selection for seed-based expression [16, 9] (LtB, Cellulase, trypsin and laccase), the selection and analysis have been empirical, without an appreciation of the mechanism. This study has begun to address the question of the factors that control this increase in gene expression. The ultimate goal is to understand the genetic basis of this remarkable phenomenon so that the factors can be directly selected for, in an effort to increase expression for cost-effective production of proteins from plants. This selection phenomenon that allows the recovery of higher accumulating individual proteins has been observed for all proteins that have been expressed in corn seed to date by the present authors [8]. The mechanism of this phenomenon is the question that has fascinated us for the last several years, and the ability to understand it is being addressed in this study. Microarrays can be used to determine gene expression patterns during development and other treatments [2, 4, 6, 13, 14 and 17]. In this study, microarray experiments were performed to assess which genes influence the increased protein accumulation in these maize near-isogenic lines. A single transgenic event (BCH0101) of this bacterial gene expressed in maize segregates for high and low protein accumulation. These high and low lines were exploited to begin to understand what genetic and biological factors contribute to this phenomenon. Microarray experiments using RNAs from these lines were conducted to assess differential gene expression contributing to the protein accumulation phenotype. II. BIOLOGICAL EXPERIMENTS AND MICROARRAY DATA ANALYSIS A. Development of Transgenic Lines Near iso-genic maize lines from single transgenic event were crossed to LH244, a maize inbred line related to B73 for which dense marker maps are available and its genome has been sequenced. Seeds from crosses with LH244 (a stiff stalk germplasm variety) were used for initial mRNA isolations for microarrays. Reference [9] reported the development of the transgenic corn lines that express the E1 endo- 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops 978-1-4244-8302-0/10/$26.00 ©2010 IEEE 240