Icing: Large-scale Inference of Immunoglobulin Clonotypes Federico Tomasi 1[0000000287183844] , Margherita Squillario 1[0000000266123383] , Alessandro Verri 1[0000000197779986] , Davide Bagnara 23⋆⋆[0000000178898103] , and Annalisa Barla 1⋆⋆[000000023436035X] 1 Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), Università degli studi di Genova, Genoa, I-16146, Italy 2 Department of Experimental Medicine (DIMES), Università degli studi di Genova, Genoa, I-16132, Italy 3 The Feinstein Institute for Medical Research, North Shore-LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA Abstract. Immunoglobulin (IG) clonotype identification is a fundamen- tal open question in modern immunology. An accurate description of the IG repertoire is crucial to understand the variety within the immune sys- tem of an individual, potentially shedding light on the pathogenetic pro- cess. Intrinsic IG heterogeneity makes clonotype inference an extremely challenging task, both from a computational and a biological point of view. Here we present icing, a framework that allows to reconstruct clonal families also in case of highly mutated sequences. icing has a modular structure, and it is designed to be used with large next gener- ation sequencing (NGS) datasets, a technology which allows the char- acterisation of large-scale IG repertoires. We extensively validated the framework with clustering performance metrics on the results in a simu- lated case. icing is implemented in Python, and it is publicly available under FreeBSD licence at https://github.com/slipguru/icing. Keywords: Clonotype identification · Immunoglobulin · NGS data · Cluster analysis 1 Scientific Background The identification of immunoglobulin (IG) clonotypes is a key question in modern immunology. A clonotype is a particular combination of IGs generated by a single plasma cell clone, which is a population of cells all derived from a single progenitor cell (germline). The ability to infer clonotypes is crucial as it allows to understand how much diversity an individual has in its immune repertoire and to Corresponding author: federico.tomasi@dibris.unige.it ⋆⋆ These authors contributed equally to this work.