Indian J Dairy Sci 73(3): 250-255 250 RESEARCH ARTICLE Effect of composition and size of the reference population in genotype imputation efficiency of INDUSCHIP in HF Crossbred cattle Sujit Saha, Nilesh Nayee, Heena Shah, Swapnil Gajjar, G Kishore, R O Gupta and K R Trivedi National Dairy Development Board, Anand-388 001, Gujarat, India Sujit Saha () Animal Breeding), National Dairy Development Board, Anand-388 001, Gujarat, India Email: ssaha@nddb.coop ; sujitsahaabc@gmail.com Received: 09 February 2020 / Accepted: 09 March 2020 / Published online: 12 July 2020 Indian Dairy Association (India) 2020 © Abstract: The objective of this study was to investigate the effect of composition and size of the reference population in imputation efficiency of INDUSCHIP v2 in Indian HF crossbred cattle. Data set consisted of a total of 869 cattle from 14 Indicine breeds, 2 crossbreds (HF and Jersey crossbreds) and 2 exotic breeds (HF, Jersey) genotyped with Illumina BovineHD (Illumina, San Diego, CA) panel. Post QC, 846 animals and 449955 SNPs remained for imputation study. 3 test groups each with randomly selected 25 HFCB animals with subset genotype of INDUSCHIP v2 were created, whereas with HD genotyping data of remaining animals, 3 different categories of reference groups were created namely reference 1 (HF, Jersey, all 14 Indicine breeds, HF and Jersey crossbreds), reference 2 (HF, HF crossbred, Sahiwal, Gir and Kankrej ) and reference 3 (pure HF, Sahiwal, Gir and Kankrej). Imputation efficiency of INDUSCHIP v2 was expressed in terms of concordance rate and Dosage R2 (DR2). Reference groups 1 and 2 were found to be better than Reference group 3. Further, the size of the reference population had an impact on imputation efficiency. The concordance rate and DR2 decreased with decline about population size. However, a reference population with 280 animals was found to be sufficient to obtain a concordance rate of around 95% or more and DR2 around 0.93. More number of HF, HF crossbred, Sahiwal, Gir and Kankrej animals need to be HD genotyped and incorporated in the reference population to improve the imputation efficiency of INDUSCHIP v2. Keywords: Crossbred cattle, Genotype Imputation, HD chip, LD chip, Single Nucleotide Polymorphism, Reference population Introduction Under Genomic selection, evenly spaced DNA markers (Single Nucleotide Polymorphism-SNPs) spread across the genome are used to estimate breeding values (GEBV) for the target individuals (Meuwissen et al. 2016). Genomic information from dense SNPs chips provides the opportunity to increase the rate of genetic progress in the breeding programs if a sufficient number of markers and animals with phenotypes are genotyped (Carvalheiro et al. 2014). More number of markers means greater linkage disequilibrium between SNPs and more chances of capturing genomic variation. Since genotyping with HD SNP panels are expensive, it limits the number of animals to be genotyped. Hence, in practice, a cost-effective alternative called genotype imputation is preferred. Genotype imputation makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative sample of individuals genotyped at higher density (Pausch, H, 2013). This not only makes it possible to increase the genomic information and predict missing genotypes (Marchini and Howie,2010) but to reduce genotyping costs and intensify genomic selection (Ventura et al. 2014) by genotyping more number of animals and combine data from different breeds (Larmer et al. 2014). The imputation efficiency of any chip depends upon several factors namely imputation method, software used for imputation, the MAF of the SNP to be imputed, linkage disequilibrium between SNPs, the chromosomal position of the SNP, the quality of SNP maps, size and composition of reference population, etc. (Schrooten et al. 2014). To implement genomic selection in India for indicus breeds and their taurine crosses a medium-density customized chip i.e INDUSCHIP v1 consisting of 45700 SNPs sampled from HD genotype of the mostly four indicus breeds (Gir, Sahiwal, Kankrej, Redsindhi) and their taurine crosses (HF cross & Jersey cross) have been developed. (Mrode et al. 2019). The genotyping chip contained around 41000 SNPs from HD data having high MAF (0.25), uniformly distributed across the genome for all the breeds under study with an average distance between two consecutive SNPs around 65 kbps. In addition to the above, 2000 ancestry https://doi.org/10.33785/IJDS.2020.v73i03.010