JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 6, 2001 Mary Ann Liebert, Inc. Pp. 571–583 Algorithms for Identifying Protein Cross-Links via Tandem Mass Spectrometry TING CHEN, 1 JACOB D. JAFFE, 2 and GEORGE M. CHURCH 3 ABSTRACT Cross-linking technology combined with tandem mass spectrometry (MS-MS) is a powerful method that provides a rapid solution to the discovery of protein–protein interactions and protein structures. We studied the problem of detecting cross-linked peptides and cross- linked amino acids from tandem mass spectral data. Our method consists of two steps: the rst step nds two protein subsequences whose mass sum equals a given mass measured from the mass spectrometry; and the second step nds the best cross-linked amino acids in these two peptide sequences that are optimally correlated to a given tandem mass spectrum. We designed fast and space-efcient algorithms for these two steps and implemented and tested them on experimental data of cross-linked hemoglobin proteins. An interchain cross- link between two ¯ subunits was found in two tandem mass spectra. The length of the cross-linker (7.7 Å ) is very close to the actual distance (8.18 Å ) obtained from the molecular structure in PDB. Key words: proteomics, mass spectrometry, algorithms, protein cross-linking, protein-protein interactions, protein folding. 1. INTRODUCTION I n recent years, more and more genomes of model organisms have been sequenced. Using these genomic sequences, researchers have focused on the identi cation of genes on the genome, the study of gene regulation and gene regulatory networks, the discovery of signal transduction pathways, the de- termination of protein structures, the detection of protein–protein, protein–DNA, and protein–metabolite interactions, and the elucidation of functions of genes and their protein products. A method which combines chemical cross-linking of proteins with mass spectrometry or tandem mass spectrometry may be useful in discovering protein complexes, determining their structure (Young et al., 2000) and/or quantitating in vivo concentrations. This paper focuses on new algorithms for interpretation of complex experimental data generated by protein cross-linking and tandem mass spectrometry. Traditionally, three-dimensional structures of proteins are solved by x-ray crystallography and NMR. However, generating an accurate structure that satises constraints of experimental data can be extremely 1 Department of Molecular Biology, University of Southern California, Los Angeles, CA 90089. 2 Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138. 3 Department of Genetics, Harvard Medical School, Boston, MA 02115. 571