Analyzing Genetic Factors Involved in Recombinant
Protein Expression Enhancement
Daniel Johnson*, Keat Teoh, Cody Ashby, Elizabeth Hood and Xiuzhen Huang*
Arkansas State University, Jonesboro, Arkansas 72467 U.S.A.
Corresponding emails: Daniel.johnson@smail.astate.edu; xhuang@astate.edu
Abstract—Understanding the genetic factors that promote
recombinant protein accumulation in transgenic plants
will provide insightful strategies for protein biofactory
efficiency. Through combining biological and
bioinformatics analysis, our work is to determine genetic
and biological factors affecting increased protein
accumulation of a bacterial cellulase enzyme in transgenic
maize. Microarray experiments were performed on maize
near-isogenic lines that exhibit high and low accumulation
of the enzyme expressed from a transgene in seeds.
Through microarray data analysis, two thousand three
hundred thirteen genes were identified which exhibited at
least a 1.5-fold change in expression level. One hundred
sixty-one genes from the data set are shown to be
statistically valid. Of these, eighty-two genes are up-
regulated while seventy-nine genes are down-regulated.
Preliminary functional analysis of the genes was
conducted and several pathways of biological importance
were tentatively identified. These genes code for four
categories of proteins: known proteins, zein proteins,
putative proteins and unknown proteins. Further
functional clustering and annotation analysis for these
genes will help construct and define networks of
interaction as well as predict important metabolic
pathways to understand the controlling mechanisms that
lead to the hyper-accumulation phenomenon.
Keywords-gene expression; microarray data;
recombinant protein accumulation
I. BACKGROUND
Plant Biotechnology was developed to
engineer plants to express traits that improved their
growth and productivity [11]. In the last twenty years,
plants have been used to produce output traits including
pharmaceuticals and industrial enzymes [4, 7 and 10].
Among those industrial enzymes are cellulases that will
be useful for deconstructing cellulose for biobased
products.
Seed based expression is a useful system for
stable, high-level accumulation of a target protein
expressed from a transgene [16]. Among other proteins,
both an endo-cellulase (E1) and an exo-cellulase
(CBHI) have been expressed in seed [9]. One of the
major roadblocks to the use of cellulases in the bio-fuels
and bio-based products industries is the availability of
an inexpensive large-scale source of the enzymes. In an
effort to produce large amounts of cellulase for
industrial applications, the maize seed expression
system was tested [9]. For the results reported here, the
Acidothermus cellulolyticus endo-/belta-1,4-glucanase
gene (E1) was placed under the control of maize
embryo-preferred promoter elements to induce high
levels of recombinant protein in seed. Six generations of
breeding in the back cross program were performed
with the goal of developing production lines with good
agronomic traits. An additional result from the breeding
program was recovery of seed with a greater than 10-
fold increase in cellulase protein above first generation
seed.
Although a number of proteins have shown
this phenomenon of increased accumulation through
breeding and selection for seed-based expression [16, 9]
(LtB, Cellulase, trypsin and laccase), the selection and
analysis have been empirical, without an appreciation of
the mechanism. This study has begun to address the
question of the factors that control this increase in gene
expression. The ultimate goal is to understand the
genetic basis of this remarkable phenomenon so that the
factors can be directly selected for, in an effort to
increase expression for cost-effective production of
proteins from plants.
This selection phenomenon that allows the
recovery of higher accumulating individual proteins has
been observed for all proteins that have been expressed
in corn seed to date by the present authors [8]. The
mechanism of this phenomenon is the question that has
fascinated us for the last several years, and the ability to
understand it is being addressed in this study.
Microarrays can be used to determine gene
expression patterns during development and other
treatments [2, 4, 6, 13, 14 and 17]. In this study,
microarray experiments were performed to assess which
genes influence the increased protein accumulation in
these maize near-isogenic lines. A single transgenic
event (BCH0101) of this bacterial gene expressed in
maize segregates for high and low protein
accumulation. These high and low lines were exploited
to begin to understand what genetic and biological
factors contribute to this phenomenon. Microarray
experiments using RNAs from these lines were
conducted to assess differential gene expression
contributing to the protein accumulation phenotype.
II. BIOLOGICAL EXPERIMENTS AND MICROARRAY DATA
ANALYSIS
A. Development of Transgenic Lines
Near iso-genic maize lines from single
transgenic event were crossed to LH244, a maize inbred
line related to B73 for which dense marker maps are
available and its genome has been sequenced. Seeds
from crosses with LH244 (a stiff stalk germplasm
variety) were used for initial mRNA isolations for
microarrays. Reference [9] reported the development of
the transgenic corn lines that express the E1 endo-
2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops
978-1-4244-8302-0/10/$26.00 ©2010 IEEE 240