D418–D427 Nucleic Acids Research, 2023, Vol. 51, Database issue Published online 9 November 2022 https://doi.org/10.1093/nar/gkac993 InterPro in 2022 Typhaine Paysan-Lafosse 1,* , Matthias Blum 1 , Sara Chuguransky 1 , Tiago Grego 1 , Beatriz L ´ azaro Pinto 1 , Gustavo A. Salazar 1 , Maxwell L. Bileschi 2 , Peer Bork 3,15,16 , Alan Bridge 4 , Lucy Colwell 2,5 , Julian Gough 6 , Daniel H. Haft 7 , Ivica Letuni ´ c 8 , Aron Marchler-Bauer 7 , Huaiyu Mi 9 , Darren A. Natale 10 , Christine A. Orengo 11 , Arun P. Pandurangan 6,12 , Catherine Rivoire 4 , Christian J.A. Sigrist 4 , Ian Sillitoe 11 , Narmada Thanki 7 , Paul D. Thomas 9 , Silvio C.E. Tosatto 13 , Cathy H. Wu 10,14 and Alex Bateman 1 1 European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK, 2 Google Research, Brain team, Cambridge, MA, USA, 3 European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany, 4 Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland, 5 Department of Chemistry, University of Cambridge, Cambridge, UK, 6 Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis CrickAve, Trumpington, Cambridge CB2 0QH, UK, 7 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA, 8 Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany, 9 Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA, 10 Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA, 11 Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK, 12 Department of Biochemistry, Sanger Building, University of Cambridge, Cambridge, UK, 13 Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy, 14 Center for Bioinformatics and Computational Biology and Protein Information Resource, University of Delaware, Newark, DE 19711, USA, 15 Yonsei Frontier Lab (YFL), Yonsei University, 03722 Seoul, South Korea and 16 Department of Bioinformatics, Biocenter, University of W¨ urzburg, 97074 W ¨ urzburg, Germany Received September 14, 2022; Revised October 12, 2022; Editorial Decision October 14, 2022; Accepted October 28, 2022 ABSTRACT The InterPro database (https://www.ebi.ac.uk/ interpro/ ) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the web- site. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that Inter- Pro’s sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of en- gaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction. INTRODUCTION Advances in genomic technologies together with substantial reductions in the cost of sequencing have enabled the scien- tifc community to generate new sequencing data at an un- precedented scale. To be useful to the scientifc community, these hundreds of millions of sequences need to be anal- ysed and characterised, which can often be an issue as the computational time necessary to analyse those sequences is increasing exponentially. To address this challenge, several automated sequence analysis methods have been developed to annotate protein families, domains and functional sites by transferring the information, often from an experimen- tally characterised sequence, to uncharacterised sequences using predictive diagnostic models (hidden Markov mod- els,patterns, profles or fngerprints), known as signatures. * To whom correspondence should be addressed. Tel: +44 1223494344; Email: typhaine@ebi.ac.uk C The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article/51/D1/D418/6814474 by guest on 13 January 2023