Research Article
K-Mer Spectrum-Based Error Correction Algorithm for
Next-Generation Sequencing Data
Hussah N. AlEisa ,
1
Safwat Hamad ,
2
and Ahmed Elhadad
3
1
Department of Computer Sciences, College of Computer and Information Sciences,
Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
2
Department of Scientific Computing, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
3
Department of Computer Science, Faculty of Computers and Information, South Valley University, Qena, Egypt
Correspondence should be addressed to Hussah N. AlEisa; haleisa@pnu.edu.sa
Received 18 May 2022; Accepted 13 June 2022; Published 14 July 2022
Academic Editor: Wei Xiang
Copyright © 2022 Hussah N. AlEisa et al. is is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
In the mid-1970s, the first-generation sequencing technique (Sanger) was created. It used Advanced BioSystems sequencing
devices and Beckman’s GeXP genetic testing technology. e second-generation sequencing (2GS) technique arrived just several
years after the first human genome was published in 2003. 2GS devices are very quicker than Sanger sequencing equipment, with
considerably cheaper manufacturing costs and far higher throughput in the form of short reads. e third-generation sequencing
(3GS) method, initially introduced in 2005, offers further reduced manufacturing costs and higher throughput. Even though
sequencing technique has result generations, it is error-prone due to a large number of reads. e study of this massive amount of
data will aid in the decoding of life secrets, the detection of infections, the development of improved crops, and the improvement
of life quality, among other things. is is a challenging task, which is complicated not just by a large number of reads and by the
occurrence of sequencing mistakes. As a result, error correction is a crucial duty in data processing; it entails identifying and
correcting read errors. Various k-spectrum-based error correction algorithms’ performance can be influenced by a variety of
characteristics like coverage depth, read length, and genome size, as demonstrated in this work. As a result, time and effort must be
put into selecting acceptable approaches for error correction of certain NGS data.
1. Introduction
Nature methods named next-generation high-throughput
DNA sequencing techniques as the method of the year in
2007. ese methods are creating interesting new potential
in biology [1]. e road to garnering the approval of the
revolutionary technology, on the other hand, was not simple.
Until recently, the Sanger enzymatic dideoxy method, first
explained in 1977, and the Maxam and Gilbert chemical
degradation technique, first mentioned in the same year,
were the methodologies used for sequence analysis. e
Maxam and Gilbert chemical degradation technique was
used in sequential cases that could not be solved easily with
the Sanger method [2]. e potential to decipher genomes
and conduct ground-breaking biomedical sciences has been
made possible by the rapid synthesis and accessibility of
enormous amounts of DNA sequencing obtained by next-
generation sequencing (NGS) technology at a lower cost
than traditional Sanger sequencing [3]. ere has been a
significant trend apart from using automated Sanger se-
quencing for genome analysis in the last four years. Previous
to this departure, the automated Sanger sequencing had
taken over the market for half a century, resulting in a slew of
significant achievements, such as the production of the only
completed human genome sequence.
Despite numerous technological advances during this
period, the drawbacks of automated Sanger sequencing
demonstrated the need for new and superior methods for
sequencing huge numbers of human genomes [4]. Sanger
sequencing has seen less documented advancements as
Hindawi
Computational Intelligence and Neuroscience
Volume 2022, Article ID 8077664, 8 pages
https://doi.org/10.1155/2022/8077664