J.-L. Hainaut et al. (Eds.): ER Workshops 2007, LNCS 4802, pp. 14–23, 2007.
© Springer-Verlag Berlin Heidelberg 2007
Massive Protein Structural Property Explorations Using
New Indexing Mechanism
Yu-Feng Huang
1
, Chia-Chen Chang
2
, and Chien-Kang Huang
2,*
1
Department of Computer Science and Information Engineering, National Taiwan University,
Taipei, Taiwan 106
2
Department of Engineering Science and Ocean Engineering, National Taiwan University,
Taipei, Taiwan 106
* Tel.: +886 2 3366 5736, Fax: +886 2 2932 9885
yfhuang@csie.ntu.edu.tw, {r95525051, ckhuang}@ntu.edu.tw
Abstract. In order to comprehend residue environment, we use residue
environmental sphere which is a sphere with 10 Å of radius, to describe
environment information surrounding a residue. For the purpose of detecting
residue-residue contacts more quickly and efficiently, we decompose a protein
structure into lots of spheres, and it is a great challenge to store protein structure
and sphere information in database. Therefore, we build a database for protein
structure, ligand/substrate, and DNA/RNA information for quick search and
mining to observe residue environment of protein structure. In each residue
environmental sphere, we can easily identify neighbor residues and their
properties, including secondary structure, physicochemical property, and b-
factor, could be considered. In this paper, we focus on disulfide bond which
stabilizes protein folding. Furthermore, we detect all possible residue contacts
of cysteine pairs in three-dimensional space and disulfide bonds between two
cysteines annotated in Protein Data Bank to analyze how disulfide bond affects
protein structures. We use a sphere to represent a protein structure and build a
database for protein structure and structure representation for further analysis.
Keywords: Residue environmental sphere (RES), protein structural property
mining, residue contact, disulfide bond.
1 Introduction
As of July 3, 2007, there are 44,476 determined protein structures examined by X-ray
or nuclear magnetic resonance (NMR) in Protein Data Bank (PDB) [4]. They include
proteins, protein complexes, nucleic acids and protein nucleic acid complexes.
Applying mining technique on protein structures is an interesting issue to discover
residue environmental information inside protein structure [12, 13, 14]. Residue
environment has been studied for many years and applied on protein threading and
protein binding site characterization [2, 15]. In the protein structure, a residue is the
essential element for conformation, and residue-residue contacts will affect the overall
*
Corresponding author.