Dynamic Expedition of Leading Mutations in SARS-CoV-2 Spike Glycoproteins Zhouyi He ∗ , 1, 2 Muhammad Hasan ∗ , 1, 2 Mengqi Jia ∗ , 1 Kathiresan Natarajan, 3 Shan Qi Yap, 1 Feng Zhou, 1 Hailei Su † , 4 Kaicheng Zhu † , 1 and Haibin Su †1, 2 1 Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China 2 Hong Kong Branch of the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China 3 Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram - 695014, Kerala, India 4 Bengbu Hospital of Traditional Chinese Medicine, 4339 Huai-shang Road, Anhui 233080, China (Dated: December 29, 2021) During the ongoing CoVID-19 epidemic, the continuous genomic evolution of severe acute res- piratory syndrome coronavirus 2 (SARS-CoV-2) has been generating new variants with enhanced transmissibility and immune escape. Being one key target of antibodies, mutations of the spike glycoprotein play a vital role in the trajectory of virus evasion. Here, we present a time-resolved statistical method, dynamic expedition of leading mutations (deLemus), to analyze the evolution dynamics of the spike protein. Together with analysis on single amino-acid polymorphism (SAP), we proposed one L-index to quantify the mutation strength of each amino acid for unravelling mutation pattern of spike glycoprotein. The sites of interest (SOI) with high L-index hold great promise to detect potential signal of emergent variants. Introduction The advent of coronavirus disease 2019 (CoVID-19) in Dec. 2019, by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its subsequent global spread have elevated it a major global threat. [1] The epidemic not only has a significant impact on the economic and social activities, but also has taken away millions of lives. [2] Therefore, efforts have been taken to mitigate the effects of this massive outbreak such as implementing mass vaccination and enforcing lockdowns. The genome size of coronaviruse ranges from approximately 26 to 32 kilobases, among the largest of RNA viruses. [3] Together with its translational frameshifting mechanism in open reading frame 1a and 1b, coronavirus exhibits diversity in encoded proteins and corresponding functions. SARS-CoV-2 has 3 ′ -5 ′ exonuclease proofreading of nonstructural protein 14 that has a lower substitution rate than other RNA viruses. [4–6] However, the proofreading function of SARS-CoV-2 cannot repair deletions resulting in viruses with changed protein sequences over time. The SARS-COV-2 genome mutation rate is in the order of 10 −3 nucleotide substitution per nucleotide site per year. [7] Recombination between variants may also add to the virulence and severity of SARS-CoV-2. [8] The potential recombination region is found inside the spike glycoprotein’s receptor-binding domain (RBD), suggesting that the virus might evolve in a directed manner. The emergence of deletions can provide a new pathway for the evolution of coronavirus, leading to highly transmissible variants. As the pandemic rages on, the virus evolves, and the genetic diversity increases. Evolution happens when there is a lot of genetic diversity, which then meets a selection pressure. Continuous genomic surveillance is possible due to high- throughput worldwide genome sequencing and data sharing via GISAID. [9] Over a hundred additional SARS-COV-2 lineages have emerged in the last two years. [10] Several variants including α (B.1.1.7), β (B.1.351), γ (P.1), δ (B.1.617.2) and o (B.1.1.529) were declared as VoC by the World Health Organization(WHO), [11] as they caused resurgence of CoVID-19. Additionally, mutations in these viruses could make it more resistant to vaccines. The relentless mutation of this coronavirus complicates vaccine and drug research, yet the complexity of biological science and cognitive biases prevent quantitative capture of viral evolution dynamics. [12] Across the genome landscape of SARS-COV-2, the spike glycoprotein coding region is the most active plateau as seen by residue diversity, which is the key mediator of viral entry and fusion into target cells. Spike glycoprotein is a trimeric type I viral fusion protein that binds virus to the Angiotensin-converting enzyme 2 (ACE2) receptor on host cells. [13–16] Spike proteins are glycosylated with N- [*] These authors contribute equally. [†] Corresponding authors. preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this this version posted December 29, 2021. ; https://doi.org/10.1101/2021.12.29.474427 doi: bioRxiv preprint