The TOPStrings Protein Structure Comparison Method - A Novel Approach Mallika Veeramalai 1 and David Gilbert 1 1 Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, Scotland, UK. Here, we introduce TOPStrings (TOPS+), a highly abstract string-based model of protein topology which permits efficient computation of structure comparison, and can optionally represent ligand information. In this model we consider loops as secondary structure elements (SSEs) as well as helices and strands; in addition we represent ligands as first class objects. Interactions between SSEs, and between SSEs and ligands are described by incoming and outgoing arcs, and SSEs are annotated with arc interaction direction and type. We are able to abstract away from the ligands themselves, to give a model characterized by a regular grammar rather than the context sensitive grammar of the original TOPS model (Gilbert, et al., 2000; Gilbert, et al., 2001; Viksna and Gilbert, 2001). Our TOPStrings model is sufficiently descriptive to obtain biologically meaningful results and has the advantage of permitting fast string-based structure matching and comparison as well as avoiding issues of NP-completeness associated with graph problems. We have developed a string version of our general graph model, called TOPStrings, where the relationships between SSEs and SSEs and ligands are reduced into incoming arcs (into an SSE from another SSE earlier in the chain, or from a ligand) and outgoing arcs (from an SSE to an SSE further on in the chain) SSEs, retaining their arc type properties. Ligands are represented indirectedly via their connections to SSEs rather than as first class objects. For example, Fig 1 (a1) and (a2) illustrate the visual representation of our enhanced TOPS model and reduced TOPStrings model for the protein domain 1fnb01. Here the triangles represent the beta strands; red curves represent the alpha helix; circles indicate loop regions and green arcs indicate hydrogen bonds between two beta strands, called the anti parallel beta sheet. The length of a TOPStrings is given by the number of SSEs; thus the length of 1fnb01 is 18. We have chosen this representation because this linear representation of protein topology can be described by a regular grammar, permitting the use of efficient and tractable string-based matching algorithms for matching and comparison rather than graph-based approaches which are NP-complete. We note that the original TOPS graphs of Gilbert et al. had a strict linear ordering on the nodes, which was exploited in the subgraph isomorphism matching algorithm developed by Viksna et al (Viksna and Gilbert, 2001). Moreover, their approach was tractable for these TOPS graphs with an average of 50 nodes or less. Our enhanced descriptions effectively double the number of nodes by introducing loops as SSEs; moreover the introduction of ligand nodes effectively destroys the linear ordering of the nodes in the graph. Our TOPStrings representation is sufficient to perform biologically meaningful protein structure comparison and has the advantage of fast comparisons. Fig 1 (a1) general enhanced TOPS model, (a2) TOPStrings model.