ORIGINAL ARTICLE Improved HLA typing of Class I and Class II alleles from next- generation sequencing data Angelina Sverchkova | Irantzu Anzar | Richard Stratford | Trevor Clancy NEC OncoImmunity AS, Oslo Cancer Cluster, Innovation Park, Oslo, Norway Correspondence Trevor Clancy, NEC OncoImmunity AS, Oslo Cancer Cluster, Innovation Park, Oslo, Norway. Email: trevor@oncoimmunity.com Funding information Norges Forskningsråd, Grant/Award Number: Nærings PhD; Norwegian Research Council Precise HLA genotyping is of great clinical importance, albeit a challenging bioin- formatics endeavor because of the hyper polymorphism of the HLA region. The ever-increasing availability of next-generation sequencing (NGS) solutions has spurred the development of several computational methods for predicting HLA genotypes from NGS data. Although some of these tools genotype HLA Class I alleles reasonably well, there is a need to incorporate integrative parameters related to ethnicity frequency information, in order to improve performance for both Class I and Class II alleles. Here, we present a bioinformatics method that addresses some of the current shortfalls in HLA genotyping from NGS. First, reads that map to the HLA region is aligned against a comprehensive library of reference HLA alleles. The allele type was then subsequently determined on the basis of the distri- bution of aligned reads, and the prior probabilities of the ethnic frequencies of alleles. Three public NGS datasets were used to benchmark the approach against six similar tools. The method outlined in this manuscript displayed an overall accu- racy of 98.73% for Class I and 96.37% for Class II alleles. We illustrate an improved integrative approach that outperforms existing tools and is able to predict HLA alleles with improved fidelity for both Class I and Class II alleles. KEYWORDS antigen discovery, bioinformatics, HLA typing, neoantigen, NGS 1 | INTRODUCTION The major histocompatibility complex (MHC) region, named the human leukocyte antigen (HLA) in humans, is located on the short arm of chromosome 6 (6p21.3) and encodes cell-surface proteins that play a crucial role in the adaptive immune system. The genes of the HLA region are important for antigen presentation and are categorized into the two main classes, Class I and Class II. The proteins coded by Class I and Class II genes present self- and anti- genic peptides to receptors on other immune cells. 1,2 This antigen presentation mechanism guides immune responses against diverse pathogens, such as bacteria and viruses, malignant cancer cells and can determine the outcome of organ and stem-cell transplantation. 3,4 The HLA genes have been shown to play a key role in autoimmune and infectious diseases. 3,5,6 In addition, HLA mutations have been impli- cated in cancer progression and loss or down-regulation of HLA Class I antigens in tumor cells represents an important cancer immune escape mechanism. 7-10 ABBREVIATIONS: DB, Database; HLA, Human leukocyte antigen; MHC, Major histocompatibility complex; NGS, Next-generation sequencing; WES, Whole-exome sequence; WGS, Whole-genome sequence; PCR, Polymerase chain reaction; ILP, Integer linear programming; IPD-IMGT/HLA, ImMunoGeneTics project/human leukocyte antigen; SBT, Sequence-based typing; SSP, Sequence-specific primer; SSO, Sequence-specific oligonucleotide. Received: 10 May 2019 Revised: 27 August 2019 Accepted: 4 September 2019 DOI: 10.1111/tan.13685 © 2019 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd HLA. 2019;110. wileyonlinelibrary.com/journal/tan 1