ORIGINAL ARTICLE
Improved HLA typing of Class I and Class II alleles from next-
generation sequencing data
Angelina Sverchkova | Irantzu Anzar | Richard Stratford | Trevor Clancy
NEC OncoImmunity AS, Oslo Cancer
Cluster, Innovation Park, Oslo, Norway
Correspondence
Trevor Clancy, NEC OncoImmunity AS,
Oslo Cancer Cluster, Innovation Park, Oslo,
Norway.
Email: trevor@oncoimmunity.com
Funding information
Norges Forskningsråd, Grant/Award
Number: Nærings PhD; Norwegian Research
Council
Precise HLA genotyping is of great clinical importance, albeit a challenging bioin-
formatics endeavor because of the hyper polymorphism of the HLA region. The
ever-increasing availability of next-generation sequencing (NGS) solutions has
spurred the development of several computational methods for predicting HLA
genotypes from NGS data. Although some of these tools genotype HLA Class I
alleles reasonably well, there is a need to incorporate integrative parameters related
to ethnicity frequency information, in order to improve performance for both Class
I and Class II alleles. Here, we present a bioinformatics method that addresses
some of the current shortfalls in HLA genotyping from NGS. First, reads that map
to the HLA region is aligned against a comprehensive library of reference HLA
alleles. The allele type was then subsequently determined on the basis of the distri-
bution of aligned reads, and the prior probabilities of the ethnic frequencies of
alleles. Three public NGS datasets were used to benchmark the approach against
six similar tools. The method outlined in this manuscript displayed an overall accu-
racy of 98.73% for Class I and 96.37% for Class II alleles. We illustrate an
improved integrative approach that outperforms existing tools and is able to predict
HLA alleles with improved fidelity for both Class I and Class II alleles.
KEYWORDS
antigen discovery, bioinformatics, HLA typing, neoantigen, NGS
1 | INTRODUCTION
The major histocompatibility complex (MHC) region,
named the human leukocyte antigen (HLA) in humans, is
located on the short arm of chromosome 6 (6p21.3) and
encodes cell-surface proteins that play a crucial role in the
adaptive immune system. The genes of the HLA region are
important for antigen presentation and are categorized into
the two main classes, Class I and Class II. The proteins
coded by Class I and Class II genes present self- and anti-
genic peptides to receptors on other immune cells.
1,2
This
antigen presentation mechanism guides immune responses
against diverse pathogens, such as bacteria and viruses,
malignant cancer cells and can determine the outcome of
organ and stem-cell transplantation.
3,4
The HLA genes have
been shown to play a key role in autoimmune and infectious
diseases.
3,5,6
In addition, HLA mutations have been impli-
cated in cancer progression and loss or down-regulation of
HLA Class I antigens in tumor cells represents an important
cancer immune escape mechanism.
7-10
ABBREVIATIONS: DB, Database; HLA, Human leukocyte antigen;
MHC, Major histocompatibility complex; NGS, Next-generation
sequencing; WES, Whole-exome sequence; WGS, Whole-genome
sequence; PCR, Polymerase chain reaction; ILP, Integer linear
programming; IPD-IMGT/HLA, ImMunoGeneTics project/human
leukocyte antigen; SBT, Sequence-based typing; SSP, Sequence-specific
primer; SSO, Sequence-specific oligonucleotide.
Received: 10 May 2019 Revised: 27 August 2019 Accepted: 4 September 2019
DOI: 10.1111/tan.13685
© 2019 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd
HLA. 2019;1–10. wileyonlinelibrary.com/journal/tan 1