Indian J Dairy Sci 73(3): 250-255
250
RESEARCH ARTICLE
Effect of composition and size of the reference population in genotype
imputation efficiency of INDUSCHIP in HF Crossbred cattle
Sujit Saha, Nilesh Nayee, Heena Shah, Swapnil Gajjar, G Kishore, R O Gupta and K R Trivedi
National Dairy Development Board, Anand-388 001, Gujarat, India
Sujit Saha ()
Animal Breeding), National Dairy Development Board, Anand-388
001, Gujarat, India
Email: ssaha@nddb.coop ; sujitsahaabc@gmail.com
Received: 09 February 2020 / Accepted: 09 March 2020 / Published online: 12 July 2020
Indian Dairy Association (India) 2020 ©
Abstract: The objective of this study was to investigate the
effect of composition and size of the reference population in
imputation efficiency of INDUSCHIP v2 in Indian HF crossbred
cattle. Data set consisted of a total of 869 cattle from 14 Indicine
breeds, 2 crossbreds (HF and Jersey crossbreds) and 2 exotic
breeds (HF, Jersey) genotyped with Illumina BovineHD (Illumina,
San Diego, CA) panel. Post QC, 846 animals and 449955 SNPs
remained for imputation study. 3 test groups each with randomly
selected 25 HFCB animals with subset genotype of INDUSCHIP
v2 were created, whereas with HD genotyping data of remaining
animals, 3 different categories of reference groups were created
namely reference 1 (HF, Jersey, all 14 Indicine breeds, HF and
Jersey crossbreds), reference 2 (HF, HF crossbred, Sahiwal, Gir
and Kankrej ) and reference 3 (pure HF, Sahiwal, Gir and Kankrej).
Imputation efficiency of INDUSCHIP v2 was expressed in terms
of concordance rate and Dosage R2 (DR2). Reference groups 1
and 2 were found to be better than Reference group 3. Further,
the size of the reference population had an impact on imputation
efficiency. The concordance rate and DR2 decreased with decline
about population size. However, a reference population with 280
animals was found to be sufficient to obtain a concordance rate
of around 95% or more and DR2 around 0.93. More number of
HF, HF crossbred, Sahiwal, Gir and Kankrej animals need to be
HD genotyped and incorporated in the reference population to
improve the imputation efficiency of INDUSCHIP v2.
Keywords: Crossbred cattle, Genotype Imputation, HD chip,
LD chip, Single Nucleotide Polymorphism, Reference population
Introduction
Under Genomic selection, evenly spaced DNA markers (Single
Nucleotide Polymorphism-SNPs) spread across the genome are
used to estimate breeding values (GEBV) for the target individuals
(Meuwissen et al. 2016). Genomic information from dense SNPs
chips provides the opportunity to increase the rate of genetic
progress in the breeding programs if a sufficient number of
markers and animals with phenotypes are genotyped (Carvalheiro
et al. 2014). More number of markers means greater linkage
disequilibrium between SNPs and more chances of capturing
genomic variation. Since genotyping with HD SNP panels are
expensive, it limits the number of animals to be genotyped. Hence,
in practice, a cost-effective alternative called genotype imputation
is preferred. Genotype imputation makes it possible to extrapolate
genotypes from lower to higher density arrays based on a
representative sample of individuals genotyped at higher density
(Pausch, H, 2013). This not only makes it possible to increase the
genomic information and predict missing genotypes (Marchini
and Howie,2010) but to reduce genotyping costs and intensify
genomic selection (Ventura et al. 2014) by genotyping more
number of animals and combine data from different breeds (Larmer
et al. 2014). The imputation efficiency of any chip depends upon
several factors namely imputation method, software used for
imputation, the MAF of the SNP to be imputed, linkage
disequilibrium between SNPs, the chromosomal position of the
SNP, the quality of SNP maps, size and composition of reference
population, etc. (Schrooten et al. 2014).
To implement genomic selection in India for indicus breeds and
their taurine crosses a medium-density customized chip i.e
INDUSCHIP v1 consisting of 45700 SNPs sampled from HD
genotype of the mostly four indicus breeds (Gir, Sahiwal, Kankrej,
Redsindhi) and their taurine crosses (HF cross & Jersey cross)
have been developed. (Mrode et al. 2019). The genotyping chip
contained around 41000 SNPs from HD data having high MAF
(0.25), uniformly distributed across the genome for all the breeds
under study with an average distance between two consecutive
SNPs around 65 kbps. In addition to the above, 2000 ancestry
https://doi.org/10.33785/IJDS.2020.v73i03.010