Abstract. The positions of a given fold always occupied by strong hydrophobic amino acids (V, I, L, F, M, Y, W), which we call ``topohydrophobic positions'', were detected and their properties demonstrated within 153 non-redundant families of homologous domains, through 3D structural alignments. Sets of divergent sequences possessing at least four to ®ve members appear to be as informative as larger sets, provided that their mean pairwise sequence identity is low. Amino acids in topohydrophobic positions exhibit several interesting features: they are much more buried than their equivalents in non-topohydrophobic positions, their side chains are far less dispersed; and they often constitute a lattice of close contacts in the inner core of globular domains. In most cases, each regular secondary structure possesses one to three topohydrophobic posi- tions, which cluster in the domain core. Moreover, using sensitive alignment processes such as hydrophobic clus- ter analysis (HCA), it is possible to identify topohydro- phobic positions from only a small set of divergent sequences. Amino acids in topohydrophobic positions, which can be identi®ed directly from sequences, consti- tute key markers of protein folds, de®ne long-range structural constraints, which, together with secondary structure predictions, limit the number of possible conformations for a given fold. Key words: Hydrophobic core ± Solvent accessibility ± Hydrophobicity ± Folding ± Modelling 1 Introduction Many proteins are able to fold in physiologic conditions without the help of chaperon proteins [1]. Thus, in principle, it should be possible to predict the structure of a protein knowing only its sequence. However, our understanding of the driving forces of protein folding is still insucient for this task, although there is ample evidence that hydrophobic amino acids play a key role in protein folding [2±8]. Hydrophobicity is one of the best conserved characteristics (of both buried and exposed amino acids) during evolution [9-12], but, surprisingly, buried hydrophobic amino acids are more often mutated than non-hydrophobic ones [13]. By comparing pairs of sequences for homologous domains of known 3D structure, two major populations of strong hydrophobic amino acids can be distinguished: those which share the same position in the two structures (and consequently in the two sequences), whatever their chemical nature, and those which are replaced in the other structure by non-strong hydrophobic amino acids. Calculation of the mean solvent accessibilities of these two populations showed that conserved amino acids are more buried than non-conserved ones. That unpublished study has been extended to the analysis of families of proteins of known structure, within a non-redundant bank of 150 folds constituted for this purpose. Each family was structurally aligned and the properties of the amino acids in positions where only hydrophobic amino acids were found, which we call ``topohydrophobic positions'', were studied [14]. The results show that these amino acids must play a special role in folding and stability. The properties of topohydrophobic positions are demonstrated here through structural alignments, using known 3D structures. However, even more interesting is the possibility of identifying these positions from sequence only, using sensitive sequence comparison methods such as bidimensional hydrophobic cluster analysis (HCA) [15±17], although with a lower accuracy than with structural alignments. 2 Methods Protein databanks were searched using the BLAST network server at the NCBI (National Center for Biotechnology Information) with Regular article ``Topohydrophobic positions'' as key markers of globular protein folds* Anne Poupon, Jean-Paul Mornon SysteÁmes MoleÂculaires et Biologie Structurale, LMCP, CNRS UMRC7590, UniversiteÂs P6 et P7, T16, Case 115, 4 place Jussieu, F-75232 Paris Cedex 05, France Received: 24 April 1998 / Accepted: 4 August 1998 / Published online: 16 November 1998 Theor Chem Acc (1999) 101:2±8 DOI 10.1007/s002149800m91 *Contribution to the Proceedings of Computational Chemistry and the Living World, April 20±24, 1998, Chambercy, France Correspondence to: A. Poupon e-mail: poupon@lmcp.jussieu.fr