Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/cbac Research Article Intelligent system based on data mining techniques for prediction of preterm birth for women with cervical cerclage Hasan Rawashdeh a , Shatha Awawdeh b , Fatima Shannag b , Esraa Henawi b , Hossam Faris b, *, Nadim Obeid b , Jon Hyett c a Department of Obstetrics and Gynaecology, Jordan University of Science and Technology, Jordan b King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan c Discipline of Obstetrics, Gynaecology and Neonatology, University of Sydney, Sydney, Australia ARTICLE INFO Keywords: Preterm birth Prediction system Cerclage Data mining ABSTRACT Preterm birth, dened as a delivery before 37 weeksgestation, continues to aect 815% of all pregnancies and is associated with signicant neonatal morbidity and mortality. Eective prediction of timing of delivery among women identied to be at signicant risk for preterm birth would allow proper implementation of prophylactic therapeutic interventions. This paper aims rst to develop a model that acts as a decision support system for pregnant women at high risk of delivering prematurely before having cervical cerclage. The model will predict whether the pregnancy will continue beyond 26 weeksgestation and the potential value of adding the cerclage in prolonging the pregnancy. The second aim is to develop a model that predicts the timing of spontaneous delivery in this high risk cohort after cerclage. The model will help treating physicians to dene the chronology of management in relation to the risk of preterm birth, reducing the neonatal complications associated with it. Data from 274 pregnancies managed with cervical cerclage were included. 29 of the procedures involved multiple pregnancies. To build the rst model, a data balancing technique called SMOTE was applied to over- come the problem of highly imbalanced class distribution in the dataset. After that, four classication models, namely Decision Tree, Random Forest, K-Nearest Neighbors (K-NN), and Neural Network (NN) were used to build the prediction model. The results showed that Random Forest classier gave the best results in terms of G- mean and sensitivity with values of 0.96 and 1.00, respectively. These results were achieved at an oversampling ratio of 200%. For the second prediction model, ve classication models were used to predict the time of spontaneous delivery; linear regression, Gaussian process, Random Forest, K-star, and LWL classier. The Random Forest classier performed best, with 0.752 correlation value. In conclusion, computational models can be developed to predict the need for cerclage and the gestation of delivery after this procedure. These models have moderate/high sensitivity for clinical application. 1. Introduction Preterm birth is dened by the World Health Organization (WHO) as a delivery before 37 completed weeks of gestation (Organization et al., 1977). This is usually subdivided based on gestational age into: extremely preterm (< 28 weeks), very preterm (2831 weeks), and moderate and late preterm (3236 weeks). Preterm birth aects about 518% of all pregnancies worldwide (Blencowe et al., 2013a). Cur- rently, it is the leading cause of death under 5 years of age, responsible for nearly one million deaths in the same age group (Liu et al., 2016), and its complications are the single largest direct cause of neonatal deaths, responsible for more than one third of them (Blencowe et al., 2013a). Furthermore, survivors of preterm birth have signicant risk of ongoing morbidity including neurodevelopmental impairment, cogni- tive dysfunction, learning diculties, visual problems, and growth problems (Blencowe et al., 2013a). For example, neuro developmental impairment was estimated to aect 52% of newborns at < 28 weeks, 24% of newborns at 2831 weeks, and 5% of newborns at 3236 weeks (Blencowe et al., 2013b). From economic point of view, preterm birth places heavy burden on families, national health services and health insurance agencies. For instance, the annual economic burden asso- ciated with preterm birth in the United States was 26.2 billion dollars in https://doi.org/10.1016/j.compbiolchem.2020.107233 Received 24 December 2019; Received in revised form 7 February 2020; Accepted 8 February 2020 Corresponding author. E-mail addresses: hmrawashdeh@just.edu.jo (H. Rawashdeh), sda9170256@fgs.ju.edu.jo (S. Awawdeh), fat9170271@fgs.ju.edu.jo (F. Shannag), asr9170277@fgs.ju.edu.jo (E. Henawi), hossam.faris@ju.edu.jo (H. Faris), obein@ju.edu.jo (N. Obeid), jon.hyett@sswahs.nsw.gov.au (J. Hyett). Computational Biology and Chemistry 85 (2020) 107233 Available online 15 February 2020 1476-9271/ © 2020 Elsevier Ltd. All rights reserved. T