Computers in Biology and Medicine 121 (2020) 103747 Available online 16 April 2020 0010-4825/© 2020 Elsevier Ltd. All rights reserved. DNA methylation-based age prediction using cell separation algorithm Najmeh Sadat Jaddi, Mohammad Saniee Abadeh * Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran A R T I C L E INFO Keywords: Age prediction Cell separation algorithm Regression DNA methylation data ABSTRACT The age of each individual can be predicted based on the alteration rule of DNA methylation with age. In this paper, an age prediction method is developed in order to solve multivariate regression problems from DNA methylation data, by optimizing the artifcial neural network (ANN) model using a new proposed algorithm named the Cell Separation Algorithm (CSA). The CSA imitates cell separation action by using a differential centrifugation process involving multiple centrifugation steps and increasing the rotor speed in each step. The CSA performs similar to the centrifugal force in separating the solutions based on their objective function in different steps, with velocity increasing in each step. Firstly, 25 test functions are used to test the CSA. Secondly, the CSA is examined on three forms of age prediction problems from two body fuids (blood and saliva). The healthy blood samples, diseased blood samples and saliva samples are used to test the methods capability. The results of the CSA are compared not only with other methods proposed in previous studies, but also with the results from stochastic gradient descent (SGD), ADAM, and genetic algorithm (GA). The model results of CSA are extremely better than the four methods proposed in previous works that have not used ANN training process. The CSA also outperformed SGD, ADAM that employ the ANN model without ANN optimization by meta-heuristics. The CSA results are comparable (even superior) to the GA model which takes the advantages of both ANN and meta-heuristics. 1. Introduction Aging is a natural process in human life and is affected by many factors such as genetics, environment, disease, and lifestyle [1]. After feeling the limitation of existence of skeleton to analyse skeletal markers like bones and teeth in order to predict individuals age, based on the researches obtained, it was found that gene expression, cellular struc- tures, and telomeres in organisms are changed by the aging process [2, 3]. Therefore, the prediction of age is possible from this information. Methylated DNA is biologically and chemically more steady compared to other biomarkers [4]. Several past researches regarding the selection of optimal markers have been done using a large number of CpG sites to obtain signifcant prediction accuracy [5,6]. Furthermore, the majority of studies focused on marker selection derived from reporting markers from previously published researches, Another way that has been used is to employ datasets including DNA methylation values from identifed CpG sites on the 27 K or 450 K microarrays [7]. Due to the importance of marker selection in the accuracy of the prediction, an appropriate technique is needed in order to determine the optimum markers for the prediction model. Within all methods available, flter-based and wrapper-based ap- proaches are the most common methods. Filter-based methods evaluate either every feature, named univariate flter, separately, or a subset of the full set, namely multivariate flter. Filter-based methods measure the fundamental properties of features related to class bias. These methods are independent of any learning method [8]. In contrast, wrapper-based methods construct a search (learning method) in a single approach [9]. Such methods select the best subset of genes which pro- vides the classifcation with maximum accuracy. Filter-based methods have the advantage of less computation and faster execution time compared to wrapper-based methods [8]. However, it seems that wrapper-based approaches provide more reliable results than flters by maximizing the accuracy. This paper proposes a new Cell Separation Algorithm (CSA) to enhance the optimal selection of markers for the prediction model. The CSA imitates the act of cell separation in differential centrifugation process. Based on the relationship between the rate of the separation of particles inside a heterogeneous mixture and the size and density of the particles with just gravitational force applied, the larger the size/density * Corresponding author. E-mail addresses: n.jaddi@modares.ac.ir (N.S. Jaddi), saniee@modares.ac.ir (M. Saniee Abadeh). Contents lists available at ScienceDirect Computers in Biology and Medicine journal homepage: http://www.elsevier.com/locate/compbiomed https://doi.org/10.1016/j.compbiomed.2020.103747 Received 25 January 2020; Received in revised form 3 April 2020; Accepted 3 April 2020