Combinations of long peptide sequence blocks can be used to describe toxin diversication in venomous animals Antonio Starcevic a , Ana M. Moura-da-Silva b , John Cullum c , Daslav Hranueli a , Paul F. Long d, e, f, g, * a Section for Bioinformatics, Department of Biochemical Engineering, Faculty of Food Technology & Biotechnology, University of Zagreb, Pierottijeva 6, HR- 10000 Zagreb, Croatia b Laboratorio de Imunopatologia, Instituto Butantan, Av. Vital Brasil 1500, 05503-900 S~ ao Paulo, SP, Brazil c Department of Genetics, University of Kaiserslautern, Postfach 3049, 67653 Kaiserslautern, Germany d Institute of Pharmaceutical Science, King's College London, United Kingdom e Department of Chemistry, King's College London, United Kingdom f Brazil Institute, King's College London, United Kingdom g Faculdade de Ci^ encias Farmac^ euticas, Universidade de S~ ao Paulo, Av. Prof. Lineu Prestes, 580, B16, 05508-000 S~ ao Paulo, SP, Brazil article info Article history: Received 24 November 2014 Received in revised form 7 January 2015 Accepted 13 January 2015 Available online 14 January 2015 Keywords: Toxin diversication Evolution Multiple alignments Hidden Markov models abstract An important mechanism for the evolution of toxins in venomous animals is believed to be the acqui- sition of genes encoding proteins that switch from physiological to toxic roles following gene duplication. The reverse recruitmenthypothesis pertains that these genes can also revert back to physiological functions, although such events are thought to be rare. A non-supervised homology searching method was developed which allowed the peptide diversity of animal toxins to be described as combinations between limited numbers of amino-acid sequence blocks we called tox-bits. Taking the phospholipase A2 (PLA2) protein family as an example, a Bernoulli Trial was used to test if tox-bitswere robust enough to distinguish between peptides with physiological or toxin functions. The analysis revealed that discrimination was indeed possible, and supports the very recent restrictionhypothesis whereby genes with the potential to encode toxic functions have likely been independently recruited into venom sys- tems and therefore require few, if any, reverse recruitment events. The development of tox-bitsprovides a novel bioinformatics tool to allow recognition of toxins from other proteins in genome sequences, facilitating the study of gene recruitment and duplication strategies in venom diversication. The tox- bitslibrary is freely available at http://bioserv.pbf.hr/blocks.zip. © 2015 Elsevier Ltd. All rights reserved. 1. Introduction The venoms of animals are a cocktail of mainly proteins and peptides, colloquially referred to as toxins. These toxins are delivered into a victim for the purposes of defence and/or predation by inicting a wound using specialized apparatus such as fangs, pincers, barbs and harpoons. The diversity of animals that are venomous is staggering, for example, marine invertebrates (e.g. cnidarians such as jellysh and sea anemones; cone snails and other gastropods; cephalopods and echinoderms), marine and freshwater sh, aquatic and terrestrial amphibians, reptiles especially snakes, a plethora of arachnid groups most notably scorpions and spiders; insects and even species of mammals including shrew, Platypus and a recently discovered slow loris primate Nycticebus menagenis (Fry et al., 2009; Whittington et al., 2010; Nekaris et al., 2013). Venom composition and mechanisms of delivery vary markedly, which often reects the function of venom to the natural history of a given species, but also strongly implies that venoms have evolved independently across different phyla of the animal kingdom (Casewell et al., 2013). This complex chemical diversity is believed to have arisen by convergent recruitment of ancestral genes into different animals. These ancestral genes encoded proteins which then switch functions by duplication, followed by rapid hyper-mutation from physiological to toxic roles (Fry, 2005). Reverse recruitment of these toxin genes back to physiological functions in non-venomous tissues is also * Corresponding author. Institute of Pharmaceutical Science, King's College Lon- don, 150 Stamford Street, London SE1 9NH, United Kingdom. E-mail address: paul.long@kcl.ac.uk (P.F. Long). Contents lists available at ScienceDirect Toxicon journal homepage: www.elsevier.com/locate/toxicon http://dx.doi.org/10.1016/j.toxicon.2015.01.005 0041-0101/© 2015 Elsevier Ltd. All rights reserved. Toxicon 95 (2015) 84e92