Sequence-Specific Random Coil Chemical Shifts of Intrinsically Disordered Proteins Kamil Tamiola, Burc ¸in Acar, and Frans A. A. Mulder* Groningen Biomolecular Sciences and Biotechnology Institute, UniVersity of Groningen, Nijenborgh 4, 9749 AG Groningen, The Netherlands Received June 28, 2010; E-mail: f.a.a.mulder@rug.nl Abstract: Although intrinsically disordered proteins (IDPs) are widespread in nature and play diverse and important roles in biology, they have to date been little characterized structurally. Auspiciously, intensified efforts using NMR spectroscopy have started to uncover the breadth of their conformational landscape. In particular, polypeptide backbone chemical shifts are emerging as powerful descriptors of local dynamic deviations from the “random coil” state toward canonical types of secondary structure. These digressions, in turn, can be connected to functional or dysfunctional protein states, for example, in adaptive molecular recognition and protein aggregation. Here we describe a first inventory of IDP backbone 15 N, 1 H N , 1 H R , 13 C O , 13 C , and 13 C R chemical shifts using data obtained for a set of 14 proteins of unrelated sequence and function. Singular value decomposition was used to parametrize this database of 6903 measured shifts collectively in terms of 20 amino acid-specific random coil chemical shifts and 40 sequence-dependent left- and right- neighbor correction factors, affording the ncIDP library. For natively unfolded proteins, random coil backbone chemical shifts computed from the primary sequence displayed root-mean-square deviations of 0.65, 0.14, 0.12, 0.50, 0.36, and 0.41 ppm from the experimentally measured values for the 15 N, 1 H N , 1 H R , 13 C O , 13 C , and 13 C R chemical shifts, respectively. The ncIDP prediction accuracy is significantly higher than that obtained with libraries for small peptides or “coil” regions of folded proteins. Introduction In recent years, NMR spectroscopy has proven to be singular in its capacity to study intrinsically disordered proteins (IDPs) with atomic detail. 1-7 Because of the lack of a unique three-dimensional structure, the conformational state of IDPs is described by extensive ensembles derived from a thoroughgoing analysis of various experimental data. 5,6,8-10 As an alternative to comprehensive structure determination, NMR chemical shifts are of significant value, since they reflect the conformational preferences of polypep- tide chains with atomic resolution. 11-13 Flexible peptides and unfolded proteins display “random coil” chemical shifts, which in turn can be used as a hallmark of disorder. The deviation of a measured chemical shift from its random coil value indicates the relative tendency of the polypeptide chain to adopt either helical or extended conformations at that point in the primary sequence, 11 thereby offering a sensitive and accurate proxy for changes in protein (dis)order and dynamics. 12,14-16 Here we describe the first neighbor-corrected random coil chemical shift library for intrinsically disordered proteins, ncIDP, which enables the straightforward and accurate prediction of nuclear shielding constants for a polypeptide sequence. To generate this library, we manually compiled a list of the chemical shifts for 14 polypeptides that have been demonstrated in independent studies to be intrinsically disordered. For 12 of these, the resonance assignments were obtained from the BioMagResBank (BMRB) repository, 17 and two further IDPs were assigned in our lab [see Table S1 and the Supporting Information (SI) for details]. Using a total of 6903 experimental nuclear shielding constants, we solved the following equation: Equation 1 states that for each protein entry i, the observed chemical shift of a nucleus n { 1 H R , 1 H N , 13 C R , 13 C , 13 C O , 15 N} in an amino acid a embedded in the tripeptide sequence x-a-y consists of a random coil reference value δ RC n (a), a left-neighbor correction Δ -1 n (x), and a right-neighbor correction Δ +1 n (y). The fourth param- eter, ε n (i), is available to account for chemical shift offsets due to alternative referencing and also subsumes systematic deviations due to variations in pH or temperature. A single offset is included for chemical shifts of type n for each entry i. In a first round, the linear set of eqs 1 was solved for the 6903 experimental chemical shifts using singular value decomposition (SVD). The SVD algorithm effectively determined the ncIDP random coil chemical shift library, which comprises the reference chemical shift values of the 20 amino acids a when adjoined by glycine along with the 40 amino acid- specific corrections. The presence of structure results in local changes in the (ensemble distribution of) bond angles, which are manifested through sequence-dependent deviations from the random coil chemical shifts. For example, the 13 C R chemical shift increases upon formation of R-helix and decreases in the context of a -strand. On the basis of various types of experimental data, reports in the literature for the IDPs studied here indicate that these polypeptides do not form stable secondary or tertiary structures but sometimes display small segments that attain weakly populated, transient forms of organization. Thus, if accurate random coil chemical shifts are available, the distribution of secondary chemical shifts would be expected to consist of a sharp peak centered at zero for those nuclei present in random coil regions, augmented with broader features arising from segments that exhibit various levels of digression from the random coil state. Figure 1 demonstrates that this is indeed what was observed when the ncIDP library was used as a reference set. The features observed in Figure 1 are not unique to 1 H R chemical shifts but are visible for all chemical shifts that are sensitive to backbone conformation 12 (see Figure S1 in the SI). Since a portion of the data contains conformational bias away from the random coil state, as gauged from the secondary chemical shifts, we devised a self-consistent optimization protocol based on multiple linear regression (described in the SI) to identify outliers in the experi- mental data and subsequently eliminate them prior to the derivation of a new, curated ncIDP library from the remaining data. Through δ n (x, a, y, i) ) δ RC n (a) + Δ -1 n (x) + Δ +1 n (y) + ε n (i) (1) Published on Web 12/03/2010 10.1021/ja105656t 2010 American Chemical Society 18000 9 J. AM. CHEM. SOC. 2010, 132, 18000–18003