560 Acta Cryst. (1990). A46, 560-567 Direct Methods with Single Isomorphous Replacement Data. I. Reduction of Systematic Errors BY W. FUREY JR,* K. CHANDRASEKHAR, F. DYDAt AND M. SAX Biocrystallography Laboratory, PO Box 12055, Veterans Administration Medical Center, University Drive C, Pittsburgh, PA 15240, USA and Department of Crystallography, 304 Thaw Hall, University of Pittsburgh, Pittsburgh, PA 15260, USA (Received 20 March 1989; accepted 28 February 1990) Abstract The direct-methods procedure for single isomorphous replacement (SIR) data [Hauptman (1982). Acta Cryst. A38, 289-294], as modified by Fortier, Moore & Fraser [Acta Cryst. (1985), A41,571-577] has been implemented and tested with a large number of known structures. It was found that the modified procedure greatly reduces the bias toward 'unre- solved' SIR invariant values associated with estimates of 0 or 7r, but does not remove it entirely. If the heavy atoms are not in a centrosymmetric array the centroid of the distribution of invariant estimates is not cen- tered on true protein values, but is biased toward conventional SIR values by up to 15° , thus errors in the estimates are not random but systematic. When the heavy atoms are in a centrosymmetric array (or single heavy-atom site in space group P21), the distri- bution of estimates is often sharply bimodal, with peaks centered at both true invariant values and pure 'unresolved' SIR values. Simple procedures are given which can be applied in both situations to reduce significantly the bias with no overall loss of accuracy. An additional correction factor is then described which can be used to remove nearly all of the bias, and improve the accuracy as well. The result is that errors in the corrected invariant estimates are small in magnitude, but are now also random instead of systematic. Since the number of estimates greatly exceeds the number of phases, the remaining random errors should have little impact in phasing processes. Introduction In recent years, theoretical developments in the area of direct methods as applied to protein crystallogra- phy have advanced considerably. In particular, a theory for the integration of direct methods with single isomorphous replacement (Hauptman, 1982) looked very promising in that it was possible accu- rately to identify large numbers of three-phase struc- ture invariants with values of 0 or rr, even for very * To whom correspondence should be addressed. t In partial fulfilment of the Doctor of Philosophy Degree. 0108-7673/90/070560-08503.00 large structures. Other procedures capable of identify- ing invariants with values of 0 or 7r from single- isomorphous-replacement data were also developed (Karle, 1983; Giacovazzo, Cascarano & Zheng, 1988). Unfortunately, it was shown (Xu, Yang, Furey, Sax, Rose & Wang, 1984) that invariant values of 0 or rr are not particularly useful for protein crystallography since they generally correspond to the heavy-atom invariants (or heavy-atom invariants plus ~r) of the included derivative. Any procedure which forces individual phases to satisfy such invariants therefore results in producing classical 'unresolved' SIR (single isomorphous replacement) phases, since the invariants themselves are actually SIR invariants (e.g. invariants produced by summing over three SIR phases). The realization of the correspondenc'~ with SIR phases prompted a re-examination and modification of Hauptman's formulation (Fortier, Moore & Fraser, 1985) resulting in a new procedure which should be considerably more powerful. With this modification it is possible accurately to identify large numbers of invariants with absolute values any- where in the range 0-Tr, however, only the magnitude of the angle can be identified (i.e. cosine invariant). By moving away from 0 and rr values the bias toward SIR invariants should be diminished and the resulting estimates should become more useful for the determi- nation of individual protein phases. In all previous studies the proposed methods were tested with error-free data, usually for a single struc- ture; thus the general applicability has not been demonstrated. In the current study we have applied the modified formulation of Fortier, Moore & Fraser to numerous structures taken from the Protein Data Bank (Bernstein et al., 1977) to determine whether the accuracy of the estimates is sensitive to space group, structure size and heavy-atom substitution parameters. It was found that although the Fortier modification greatly reduces the bias towards SIR invariants, it does not remove it entirely since a residual bias of up to 15° remains. Several alternative modifications to the procedure are now reported, all of which lead to further reductions in the bias towards SIR, and one which can significantly improve the accuracy of the estimates as well. With the new © 1990 International Union of Crystallography