Proceedings, 10 th World Congress of Genetics Applied to Livestock Production Genomic Prediction with 12.5 Million SNPs for 5503 Holstein Friesian Bulls R. van Binsbergen *,† , M.P.L. Calus * , M.C.A.M. Bink † , C. Schrooten ‡ , F.A. van Eeuwijk † , R.F. Veerkamp * . * Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, the Netherlands, † Biometris, Wageningen UR, Wageningen, the Netherlands, ‡ CRV, Arnhem, the Netherlands ABSTRACT: This study reports the first preliminary results of genomic prediction with whole-genome sequence data (12,590,056 SNPs) for 5503 bulls with accurate phenotypes. Two methods were compared: genome-enabled best linear unbiased prediction (GBLUP) and a Bayesian approach (BSSVS). Results were compared with results using BovineHD genotypes (631,428 SNPs). Results were reported for somatic cell score, interval between first and last insemination, and protein yield. For all traits, and both methods genomic prediction with sequence data showed similar results compared to BovineHD and GBLUP showed similar results compared to BSSVS. However, it remains to be seen if reliability of BSSVS with sequence data will improve after more sampling cycles have been finished. Key words: whole-genome sequence; Bayesian stochastic search variable selection; GBLUP. INTRODUCTION The use of whole-genome sequence data with millions of SNPs, including the actual causal mutations, instead of currently used SNP chips might lead to higher reliability of genomic prediction (e.g. Meuwissen and Goddard (2010)). Whether this will be achieved in a dairy cattle population with strong family relationships, is a question. Different methods are available for genomic prediction, where linear regression is used most often (for a review: de los Campos et al. (2013)). However, not all these methods take full advantage of the sequence data. This study reports the first results of genomic prediction with 12.5 Million SNPs for 5503 bulls with accurate phenotypes. Two methods were compared: genome-enabled best linear unbiased prediction (GBLUP) and Bayes stochastic search variable selection (BSSVS; e.g. Verbyla et al. (2009)). MATERIALS AND METHODS Phenotypes. De-regressed proofs (DRP) and the associated weights (effective daughter contributions; EDC) from 5503 Holstein Friesian bulls were available for somatic cell score (SCS), interval between first and last insemination (IFL), and protein yield (PY). The data were provided by CRV (Arnhem, the Netherlands). DRP were calculated according to VanRaden et al. (2009):  =  +( − ) ∗�     � where PA is parent average, EBV is the estimated breeding value for a trait, and EDC is the effective daughter contribution. EDC EBV is calculated according to VanRaden and Wiggans (1991) as   /(1 −   ) , where REL EBV is the published reliability for EBV and  = (4 −ℎ 2 )/ℎ 2 , where h 2 is the heritability of the trait.   =   −   where   =   /(1 −   ) and   =(  +   )/4 (VanRaden and Wiggans (1991). Average EDC EBV (and range) for animals in the training population was 251 (24 – 971) for SSC; 560 (37 – 4851) for IFL; and 235 (23 – 693) for PY. Genotypes. Each bull was genotyped with Illumina BovineHD BeadChip (Illumina Inc., San Diego, CA) or genotyped with a 50k SNP panel and imputed to BovineHD (777k SNPs). All BovineHD genotypes (734,403 SNPs) were imputed to whole-genome sequence (28,336,153 SNPs) using Beagle software (Browning and Browning (2013). As reference for imputation whole- genome sequence data of 429 individuals (including 121 Holstein Friesian) were used. Data were provided by the 1000 bull genomes project (Run 3.0). Each individual was sequenced with Illumina HiSeq Systems (Illumina Inc., San Diego, CA). Alignment, variant calling, and quality controls were described by Daetwyler et al. (2014). After imputation SNPs with a minor allele frequency below 0.005 or an imputation accuracy (squared correlation between estimated allele dosage and true allele dosage as predicted by Beagle; Li et al. (2010)) below 0.05 were deleted. Those criteria were chosen to remove SNPs that did not segregate in the data, or that are very likely to be imputed incorrectly. Genomic prediction. Two linear regression models were used: GBLUP and BSSVS. With GBLUP all SNPs are assumed to have equally small effect, while with BSSVS it is assumed that a large number of SNPs will have almost no effect and a few SNPs will have moderate effect. The GBLUP model was as follows  =  +  +  where  contains DRPs of all individuals, µ is the overall mean,  is a vector of ones,  is a matrix of the direct genomic values of all individuals,  is a matrix that allocates the direct genomic values to the individuals, and  contains the random residuals. Additive genetics effects were assumed to be distributed as  ~ (,  ∗   2 ) , where GRM is the genomic relationship matrix calculated following Yang et al. (2010), and   2 is the additive genetic variance. Residual effects were assumed to be distributed as  ~ (,  −1 ∗  2 ) , where  −1 is a diagonal matrix containing 1/  on the diagonals, and   2 is the residual variance. After calculation of the GRM, the GBLUP model was applied using ASReml (Gilmour et al. (2009)). The BSSVS model was as follows:  =  +  +  where  contains DRPs of all individuals, µ is the overall mean,  is a vector of ones,  is matrix that contains the genotypes of all individuals,  contains the (random) allele substitution effects for all SNPs, and  contains the random residuals. An important aspect of the model is that the prior