Interface similarity improves comparison of DNA-binding proteins: the Homeobox example ´ Alvaro Sebasti´ an 1 , Carlos P. Cantalapiedra 1 , and Bruno Contreras-Moreira 1,2 1 Laboratorio de Biolog´ ıa Computacional, Estaci´ on Experimental de Aula Dei/CSIC, Av. Monta˜ nana 1005, Zaragoza, Espa˜ na 2 Fundaci´ on ARAID, Paseo Mar´ ıa Agust´ ın 36, Zaragoza, Espa˜ na http://www.eead.csic.es/compbio {asebastian,bcontreras}@eead.csic.es Abstract. The recently published 3D-footprint database contains an up-to-date repository of protein-DNA complexes of known structure that belong to different superfamilies and bind to DNA with distinct specifici- ties. This repository can be scanned by means of sequence alignments in order to look for similar DNA-binding proteins, which might in turn rec- ognize similar DNA motifs. Here we take the complete set of Homeobox proteins from Drosophila melanogaster and their preferred DNA mo- tifs, which would fall in the largest 3D-footprint superfamily and were recently characterized by Noyes and collaborators, and annotate their in- terface residues. We then analyze the observed amino acid substitutions at equivalent interface positions and their effect on recognition. Finally we estimate to what extent interface similarity, computed over the set of residues which mediate DNA recognition, outperforms BLAST expecta- tion values when deciding whether two aligned Homeobox proteins might bind to the same DNA motif. Keywords: protein-DNA interface, DNA motif, substitution matrices 1 Introduction 3D-footprint [1] (http://floresta.eead.csic.es/3dfootprint) is a database that dissects sequence readout in protein-DNA complexes of known structure, extracted from the Protein Data Bank [2], identifying molecular contacts that contribute to specific recognition and inferring structure-based position weight matrices from the atomic coordinates. Currently the database contains over 2700 complexes, which can be assigned to SCOP superfamilies [3]. After removing re- dundancy, the most populated superfamily turns out to be that of homeodomain- like proteins, including Homeobox transcription factors, which have been the subject of extensive crystallographic and spectroscopic studies due to their key role in developmental processes in multicellular organisms [4]. Furthermore, Homeobox proteins are of special interest since the publication of the work by Noyes and collaborators [5], in which the authors characterized the