ARCHIVES OF BIOCHEMISTRY AND BIOPHYSICS Vol. 294, No. 1, April, pp. 107-114, 1992 Improvement of Protein Secondary Structure Prediction by Combination of Statistical Algorithms and Circular Dichroism’ Enrico A. Carrara, Cesare Gavotti, Paolo Catasti, Fabrizio Nozza, Luisella L. Berutti Bergotto, and Claudio A. Nicolini’ Institute of Biophysics, Medical School, University of Genoa, Salita Superiore Note 35, 16132 Genoa, Italy Received August 26, 1991, and in revised form November 27, 1991 Three different approaches (propensity curve shifting, hydropathy index evaluation, and iterative attribution/ cancellation of secondary structure) to the use of sec- ondary structure percentages derived from circular di- chroism measurements to improve the success rate of a protein secondary structure prediction method, without using decision constants, are described and compared. Propensity-curve shifting appears to be the best-per- forming approach, bearing an increase of 5.3% in the success rate of single-residue structural prediction when exact information on the secondary structure, obtained by X-ray crystallography, is employed, with information of an accuracy comparable to that obtainable by circular dichroism, the improvement stays between 3.5 and 4.9%, for a three-state prediction. Although developed with circular dichroism in mind, the method can use percent- ages of secondary structure obtained by any other ex- perimental methodology from which they can be inferred, for instance Raman spectroscopy and infrared spectros- COPY. 0 1992 Academic Press, Inc. The aim of the work described in this paper is to im- prove the accuracy and reliability of protein secondary structure predictions based on the primary sequence and statistical information by using CD-derived3 secondary structure percentages. Nowadays it is universally accepted that the tertiary structure of a protein after folding be strictly determined i The work described here has been carried on under the research contract N. 1140 with Tecnofarmaci Spa., Pomezia, Italy, within the Advanced Biotechnology National Research Plan of the Italian Minister of University and Scientific and Technological Research. ’ To whom correspondence should be addressed at Instituto di Biofis- ica, Facolta’ di Medicina e Chirurgia, Universita’ degli Studi di Genova, Salita Superiore Note 35, 16132 Genoa, Italy. 3 Abbreviation used: CD, circular dichroism. 0003-9861/92 $3.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved. by its primary sequence, as shown by (l), but the present understanding of the laws governing protein folding is not deep enough to allow, in general, a priori tertiary structure prediction from the primary sequence only. A wide sample of the available structure prediction para- digms can be found in (2). The most popular structural target for prediction methods is the secondary structure. Secondary structure motifs can be shared even by otherwise unrelated proteins, and are amenable to rigorous definition and precise at- tribution to portions of tertiary structures (3). This makes it easy to extract statistical information from structural data bases and to evaluate the performance of secondary structure prediction methods on known structures. In the following, we will refer to protein secondary structure prediction as structure prediction, or simply prediction, unless otherwise specified. The success rate of prediction methods averages, for the best of them (2, 5), around 63-65s for three-state predictions, on proteins nonhomologous to ones included in the training database. This result is much better than could be either expected from random prediction or found from ab initio calculations but not good enough for a re- liable model construction of the structure. Since several relatively simple experimental techniques, like CD (for reviews see (6) and (7)), Raman spectroscopy (B), and infrared spectroscopy (9), allow the estimation of the secondary structure contents of proteins, it would be pleasant to be able to use this information to improve the accuracy of structure prediction. Garnier et al. (10) show methods to improve prediction by means of cali- brated decision constants depending on the secondary structure contents; unfortunately, calibration of decision constants on the same data used for testing introduces a circularity that renders any claim based on the results of the tests questionable. A neural-network-based approach (11) appears to be able to exploit information on structural 107