Computers in Biology and Medicine 121 (2020) 103747
Available online 16 April 2020
0010-4825/© 2020 Elsevier Ltd. All rights reserved.
DNA methylation-based age prediction using cell separation algorithm
Najmeh Sadat Jaddi, Mohammad Saniee Abadeh
*
Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran
A R T I C L E INFO
Keywords:
Age prediction
Cell separation algorithm
Regression
DNA methylation data
ABSTRACT
The age of each individual can be predicted based on the alteration rule of DNA methylation with age. In this
paper, an age prediction method is developed in order to solve multivariate regression problems from DNA
methylation data, by optimizing the artifcial neural network (ANN) model using a new proposed algorithm
named the Cell Separation Algorithm (CSA). The CSA imitates cell separation action by using a differential
centrifugation process involving multiple centrifugation steps and increasing the rotor speed in each step. The
CSA performs similar to the centrifugal force in separating the solutions based on their objective function in
different steps, with velocity increasing in each step. Firstly, 25 test functions are used to test the CSA. Secondly,
the CSA is examined on three forms of age prediction problems from two body fuids (blood and saliva). The
healthy blood samples, diseased blood samples and saliva samples are used to test the method’s capability. The
results of the CSA are compared not only with other methods proposed in previous studies, but also with the
results from stochastic gradient descent (SGD), ADAM, and genetic algorithm (GA). The model results of CSA are
extremely better than the four methods proposed in previous works that have not used ANN training process. The
CSA also outperformed SGD, ADAM that employ the ANN model without ANN optimization by meta-heuristics.
The CSA results are comparable (even superior) to the GA model which takes the advantages of both ANN and
meta-heuristics.
1. Introduction
Aging is a natural process in human life and is affected by many
factors such as genetics, environment, disease, and lifestyle [1]. After
feeling the limitation of existence of skeleton to analyse skeletal markers
like bones and teeth in order to predict individual’s age, based on the
researches obtained, it was found that gene expression, cellular struc-
tures, and telomeres in organisms are changed by the aging process [2,
3]. Therefore, the prediction of age is possible from this information.
Methylated DNA is biologically and chemically more steady compared
to other biomarkers [4].
Several past researches regarding the selection of optimal markers
have been done using a large number of CpG sites to obtain signifcant
prediction accuracy [5,6]. Furthermore, the majority of studies focused
on marker selection derived from reporting markers from previously
published researches, Another way that has been used is to employ
datasets including DNA methylation values from identifed CpG sites on
the 27 K or 450 K microarrays [7]. Due to the importance of marker
selection in the accuracy of the prediction, an appropriate technique is
needed in order to determine the optimum markers for the prediction
model.
Within all methods available, flter-based and wrapper-based ap-
proaches are the most common methods. Filter-based methods evaluate
either every feature, named univariate flter, separately, or a subset of
the full set, namely multivariate flter. Filter-based methods measure the
fundamental properties of features related to class “bias”. These
methods are independent of any learning method [8]. In contrast,
wrapper-based methods construct a search (learning method) in a single
approach [9]. Such methods select the best subset of genes which pro-
vides the classifcation with maximum accuracy. Filter-based methods
have the advantage of less computation and faster execution time
compared to wrapper-based methods [8]. However, it seems that
wrapper-based approaches provide more reliable results than flters by
maximizing the accuracy.
This paper proposes a new Cell Separation Algorithm (CSA) to
enhance the optimal selection of markers for the prediction model. The
CSA imitates the act of cell separation in differential centrifugation
process. Based on the relationship between the rate of the separation of
particles inside a heterogeneous mixture and the size and density of the
particles with just gravitational force applied, the larger the size/density
* Corresponding author.
E-mail addresses: n.jaddi@modares.ac.ir (N.S. Jaddi), saniee@modares.ac.ir (M. Saniee Abadeh).
Contents lists available at ScienceDirect
Computers in Biology and Medicine
journal homepage: http://www.elsevier.com/locate/compbiomed
https://doi.org/10.1016/j.compbiomed.2020.103747
Received 25 January 2020; Received in revised form 3 April 2020; Accepted 3 April 2020