Imputation of low-coverage sequencing data from 1 150,119 UK Biobank genomes 2 3 Simone Rubinacci 1,2 , Robin Hofmeister 1,2 , Bárbara Sousa da Mota 1,2 , Olivier Delaneau 1,2,* 4 1 Department of computational Biology, University of Lausanne, Lausanne, Switzerland 5 2 Swiss Institute of Bioinformatics, Lausanne, Switzerland 6 * Corresponding author (olivier.delaneau@unil.ch) 7 8 Abstract 9 Recent work highlights the advantages of low-coverage whole genome sequencing (lcWGS), followed 10 by genotype imputation, as a cost-effective genotyping technology for statistical and population 11 genetics. The release of whole genome sequencing data for 150,119 UK Biobank (UKB) samples 12 represents an unprecedented opportunity to impute lcWGS with high accuracy. However, despite 13 recent progress 1,2 , current methods struggle to cope with the growing numbers of samples and 14 markers in modern reference panels, resulting in unsustainable computational costs. For instance, the 15 imputation cost for a single genome is 1.11£ using GLIMPSE v1.1.1 (GLIMPSE1) on the UKB research 16 analysis platform (RAP) and rises to 242.8£ using QUILT v1.0.4. To overcome this computational 17 burden, we introduce GLIMPSE v2.0.0 (GLIMPSE2), a major improvement of GLIMPSE, that scales 18 sublinearly in both the number of samples and markers. GLIMPSE2 imputes a low-coverage genome 19 from the UKB reference panel for only 0.08£ in compute cost while retaining high accuracy for both 20 ancient and modern genomes, particularly at rare variants (MAF < 0.1%) and for very low-coverage 21 samples (0.1x-0.5x). 22 Main 23 To demonstrate the benefits of using sequenced biobanks for lcWGS imputation, we phased the 24 recent release of the UK Biobank (UKB) WGS data 3,4 using SHAPEIT5 5 and created a UKB reference 25 panel of 280,238 haplotypes and 582,534,516 markers (Supplementary Note S1). We used the UKB 26 panel to impute lcWGS samples with GLIMPSE2 and other recently released imputation methods: 27 GLIMPSE1 1 and QUILT v1.0.4 2 . Compared to other reference panels, the UKB leads to considerable 28 accuracy improvements for British samples across all tested depths of coverage. Furthermore, 29 . CC-BY-ND 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted November 29, 2022. ; https://doi.org/10.1101/2022.11.28.518213 doi: bioRxiv preprint