Appl Bioinformatics 2004; 3 (2-3): 105-113 ORIGINAL RESEARCH 1175-5636/04/0002-0105/$31.00/0 2004 Adis Data Information BV. All rights reserved. Natively Disordered Proteins Functions and Predictions Pedro Romero, 1 Zoran Obradovic 2 and A. Keith Dunker 1 1 Center for Computational Biology and Bioinformatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, USA 2 Center for Information Science and Technology, Temple University, Philadelphia, Pennsylvania, USA Proteins can exist in at least three forms: the ordered form (solid-like), the partially folded form (collapsed, Abstract molten globule-like or liquid-like) and the extended form (extended, random coil-like or gas-like). The protein trinity hypothesis has two components: (i) a given native protein can be in any one of the three forms, depending on the sequence and the environment; and (ii) function can arise from any one of the three forms or from transitions between them. In this study, bioinformatics and data mining were used to investigate intrinsic disorder in proteins and develop neural network-based predictors of natural disordered regions (PONDR) that can discriminate between ordered and disordered residues with up to 84% accuracy. Predictions of intrinsic disorder indicate that the three kingdoms follow the disorder ranking eubacteria < archaebacteria << eukaryotes, with approximately half of eukaryotic proteins predicted to contain substantial regions of intrinsic disorder. Many of the known disordered regions are involved in signalling, regulation or control. Involvement of highly flexible or disordered regions in signalling is logical: a flexible sensor more readily undergoes conformational change in response to environmental perturbations than does a rigid one. Thus, the increased disorder in the eukaryotes is likely the direct result of an increased need for signalling and regulation in nucleated organisms. PONDR can also be used to detect molecular recognition elements that are disordered in the unbound state and become structured when bound to a biologically meaningful partner. Application of disorder predictions to cell-signalling, cancer-associated and control protein databases supports the widespread occurrence of protein disorder in these processes. For more than 100 years, the notion that a specific 3-dimension- (NMR), [4] with the consequent enlargement of structural databases, has diverted attention away from alternative views. al (3-D) structure determines a protein’s function has dominated Numerous counterexamples to the dominant structure-function our thinking about these macromolecules. This paradigm originat- paradigm have surfaced over the years – proteins for which lack of ed from the lock-and-key proposal of Fischer, [1] was strongly 3-D structure is required for function. [5,6] Such proteins have been reinforced by the explanation of denaturation as the loss of specific called ‘natively unfolded’, [7] ‘intrinsically unstructured’ [6] and ‘na- structure by Wu [2] and independently by Mirsky and Pauling, [3] tively or intrinsically disordered’. The key feature of such proteins and has continued up until the present as a result of the structural and protein regions is the failure to adopt a particular 3-D structure characterisation of many specific examples. under apparently physiological conditions; these proteins exist This presumed universal dependence of protein function on instead as ensembles of rapidly interconverting structural forms. 3-D structure even affects protein science terminology: unfolded Examples of more than 100 intrinsically disordered proteins protein and denatured protein are used interchangeably. Further- along with their functions were previously compiled from manual more, the increasingly faster accumulation of protein 3-D struc- literature searches. [8] These examples are summarised and grouped tures by x-ray diffraction and nuclear magnetic resonance into four broad functional categories: (i) molecular recognition;