J.-L. Hainaut et al. (Eds.): ER Workshops 2007, LNCS 4802, pp. 14–23, 2007. © Springer-Verlag Berlin Heidelberg 2007 Massive Protein Structural Property Explorations Using New Indexing Mechanism Yu-Feng Huang 1 , Chia-Chen Chang 2 , and Chien-Kang Huang 2,* 1 Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 106 2 Department of Engineering Science and Ocean Engineering, National Taiwan University, Taipei, Taiwan 106 * Tel.: +886 2 3366 5736, Fax: +886 2 2932 9885 yfhuang@csie.ntu.edu.tw, {r95525051, ckhuang}@ntu.edu.tw Abstract. In order to comprehend residue environment, we use residue environmental sphere which is a sphere with 10 Å of radius, to describe environment information surrounding a residue. For the purpose of detecting residue-residue contacts more quickly and efficiently, we decompose a protein structure into lots of spheres, and it is a great challenge to store protein structure and sphere information in database. Therefore, we build a database for protein structure, ligand/substrate, and DNA/RNA information for quick search and mining to observe residue environment of protein structure. In each residue environmental sphere, we can easily identify neighbor residues and their properties, including secondary structure, physicochemical property, and b- factor, could be considered. In this paper, we focus on disulfide bond which stabilizes protein folding. Furthermore, we detect all possible residue contacts of cysteine pairs in three-dimensional space and disulfide bonds between two cysteines annotated in Protein Data Bank to analyze how disulfide bond affects protein structures. We use a sphere to represent a protein structure and build a database for protein structure and structure representation for further analysis. Keywords: Residue environmental sphere (RES), protein structural property mining, residue contact, disulfide bond. 1 Introduction As of July 3, 2007, there are 44,476 determined protein structures examined by X-ray or nuclear magnetic resonance (NMR) in Protein Data Bank (PDB) [4]. They include proteins, protein complexes, nucleic acids and protein nucleic acid complexes. Applying mining technique on protein structures is an interesting issue to discover residue environmental information inside protein structure [12, 13, 14]. Residue environment has been studied for many years and applied on protein threading and protein binding site characterization [2, 15]. In the protein structure, a residue is the essential element for conformation, and residue-residue contacts will affect the overall * Corresponding author.