270 Current Protein and Peptide Science, 2009, 10, 270-285
1389-2037/09 $55.00+.00 © 2009 Bentham Science Publishers Ltd.
A Guide to Template Based Structure Prediction
Xiaotao Qu
1
, Rosemarie Swanson
2
, Ryan Day
3
and Jerry Tsai
3,*
1
Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, USA;
2
Department of Biochemistry and Biophysics,
Texas A&M University, College Station, TX 77843, USA;
3
Department of Chemistry, University of the Pacific, 3601
Pacific Avenue, Stockton, CA 95211, USA
Abstract: Template based protein structure prediction (commonly referred to as homology or comparative modeling) uses
knowledge of solved structures to model a protein sequence’s native or true fold. First, a parent structure is found and then
a template structure is built by mapping the target sequence onto the parent structure. This putative structure is refined us-
ing a combination of backbone moves, side-chain packing, and loop modeling. Template based protein structure predic-
tion has always held great promise to produce atomically accurate models close to the native conformation based on two
major assumptions. First, similar sequences exhibit similar protein folds. Second, soluble proteins populate a discrete fold
space with many representatives already solved in our Protein Data Bank (PDB). Ironically, beginning so close to the na-
tive structure is also the primary source of problems confronting this method and is the reason for the lack of progress in
this category of structure prediction. In this review, the general concepts and procedures for template based structure pre-
diction are outlined based on the following topics: sequence alignment, parent structure selection, template structure
building, refinement, evaluation, and final structure selection. Then, a description of established software and algorithms
is provided where the advantages and limitations of the different methods will be pointed out. This is followed by a dis-
cussion of the developments in template based structure prediction up to the 7
th
Critical Assessment of Structure Predic-
tion meeting. Lastly, we will address the increased difficulty in improving templates that start so close to the native struc-
ture, and discuss the improvements needed in this field.
Keywords: Template based modeling/prediction (TBM), structure prediction, side-chain packing, structure refinement, loop
modeling, multiple sequence alignment, model evaluation, structure selection.
INTRODUCTION
While commonly known as homology modeling and
more recently, comparative modeling [1-3], the method of
creating a prediction of an unknown structure using a close
structural homolog is better described as template based
modeling/prediction (TBM) of protein structure (Fig. 1).
This is now the accepted terminology in the protein structure
prediction community. The new designation is more general
and allows for the distinct contrast to template free structure
prediction, more commonly known as ab initio or de novo
modeling [4-6]. Because it is believed that a representative
of every protein fold will eventually be solved [7-13], tem-
plate based structure prediction holds a great deal of promise
for the field of protein structure modeling. The availability of
a representative fold as a starting template for a sequence of
unknown structure offers the quickest path to generating a
model of the real structure. Furthermore, template based
methods produce the most reliable and accurate predictions
of protein structure aside from experimental determination
[14, 15]. Unfortunately, the imprecise variations between the
close template and the real structure produce the major
source of challenges facing this field today. In fact, template
based structure prediction has been trying to overcome these
obstacles since its inception. In the following discussion, the
primary problems inherent to starting with inexact templates
*Address correspondence to this author at the Department of Chemistry,
University of the Pacific, 3601 Pacific Avenue, Stockton, CA 95211, USA;
Tel: (209) 946-2298; Fax: (209) 946-2607; E-mail: jtsai@pacific.edu
will be explained in more detail. For consistency and clarity,
we will adhere to the following terminology throughout this
review, as shown in Fig. (1). “Native” refers to the experi-
mentally determined structure of the target sequence. “Tar-
get” refers to the protein being predicted/modeled. “Parent”
refers to an initial known protein structure that is used to
create the starting “template” structure. Finally, a “model”
structure is any prediction of the native structure; however, it
usually indicates structures refined from the starting tem-
plate.
Although the delineation between steps is somewhat arbi-
trary since many methods combine steps, we have organized
our discussion of TBM into the four steps [16-18] outlined in
Fig. (1). First, the parent structure(s) are identified using
sequence searches against the known structure database (the
Protein Data Bank [19]). Second, the initial template struc-
ture(s) are constructed by aligning the target sequence to the
parent structure and by identifying conserved and variable
regions. Third, the structure(s) are refined through a combi-
nation of backbone moves, side-chain packing, and loop
modeling of highly variable regions. This step is an attempt
to sample the conformational space of native structure and
usually a number (on the order of thousands) of potential
models are created. So, the last step is to evaluate these
models and choose the one that is nearest in structure to the
native. In many methods described below, the first and sec-
ond steps occur concurrently, and the same can be said for
the third and fourth steps. Newer approaches include de novo
prediction of variable regions during refinement as well as