Automated Assessment of Bone Age Using Deep Learning and Gaussian Process Regression Tom Van Steenkiste 1 , Joeri Ruyssinck 1 , Olivier Janssens 1 , Baptist Vandersmissen 1 , Florian Vandecasteele 1 , Pieter Devolder 2 , Eric Achten 2 , Soﬁe Van Hoecke 1 , Dirk Deschrijver 1 , and Tom Dhaene 1 Abstract— Bone age is an essential measure of skeletal maturity in children with growth disorders. It is typically assessed by a trained physician using radiographs of the hand and a reference model. However, it has been described that the reference models leave room for interpretation leading to a large inter-observer and intra-observer variation. In this work, we explore a novel method for automated bone age assessment to assist physicians with their estimation. It consists of a powerful combination of deep learning and Gaussian process regression. Using this combination, sensitivity of the deep learning model to rotations and ﬂips of the input images can be exploited to increase overall predictive performance compared to only using the deep learning network. We validate our approach retrospectively on a set of 12611 radiographs of patients between 0 and 19 years of age. I. I NTRODUCTION Bone age assessment is used in medicine to measure skeletal and biological maturity of children [1]. It can be used, among others, to estimate the ﬁnal adult height [2], to measure therapeutic effect in patients with endocrine disorders [3] or to estimate the age of asylum seekers [4]. In the traditional method, a trained physician compares hand and wrist bones with normal age level images by radiography of the left hand and wrist in combination with reference standards. An example of such a reference standard is the hand atlas of Greulich and Pyle (G&P) [2]. However, the use of such a reference is a lengthy process and leaves room for interpretation, leading to large inter-observer and intra-observer differences. Average spread of inter-observer differences has been reported up to 11.5 months for the G&P method [5]. This causes issues when comparing estimations across patients or of the same patient over time. Furthermore, using different methods leads to variations in the estimated bone age [1]. To reduce these variations, the potential of automated methods to assist the physician has been identiﬁed and ex- plored by the community [6], [7], [8]. These methods rely on the segmentation and extraction of typical bone age features from the images. However, including a segmentation step in the processing pipeline can be a signiﬁcant disadvantage as it is challenging to make these methods robust to large variations in image quality. 1 Tom Van Steenkiste, Joeri Ruyssinck, Olivier Janssens, Baptist Vandersmissen, Florian Vandecasteele, Soﬁe Van Hoecke, Dirk Deschrijver and Tom Dhaene are with Ghent University - imec, IDLab, Technologiepark-Zwijnaarde 15, B-9052 Ghent, Belgium tomd.vansteenkiste@ugent.be 2 Pieter Devolder and Eric Achten are with University Hospital (UZ) Ghent, Department of Radiology, De Pintelaan 185, B-9000 Ghent, Belgium In other medical domains, deep learning [9] has been proven to be a successful method for image analysis. An example is the automated detection of mitosis in breast cancer histology images [10]. In bone age assessment, recent examples with deep learning include [11] where an auto- mated tool is demonstrated to enhance efﬁciency of reviewers and [12] where a fully automated setup is discussed for which estimates are accurate within 1 year 92.23% of the time and an average spread of 10.52 months is achieved for patients between 5 and 18 years of age. In this work, we explore a novel machine learning approach for bone age estimation to improve upon the standard state-of-the-art deep learning performance. Our method is based on a powerful combination of deep learning with Gaussian Process Regres- sion (GPR) [13] to exploit sensitivity of the deep learning predictions to rotations and ﬂips of the radiographs. In Section II, the dataset is described. In Section III, the methodology is explained and in Section IV the results of our tests are provided and discussed. Finally, future work is detailed and conclusions are made in Section V. II. DATASET The dataset used in this work consists of 12611 radio- graphs of the hand and wrist collected by the Radiological Society of North America in the context of the Pediatric Bone Age Prediction Challenge [14]. The institutional review boards of the organizing committee approved the study. The dataset contains 6833 radiographs of male patients and 5778 radiographs of female patients. The annotated estimated bone ages, assessed by trained physicians using the G&P hand atlas, range between 0 and 228 months. The age distribution of the dataset is not uniform as shown in Fig. 1. Fig. 2 shows several examples of radiographs in the dataset. The size, orientation, brightness and contrast differ across the samples. In some cases, additional artifacts are visible on the radiographs such as watches, plaster casts, surgical screws and assisting nurses. Sometimes parts of the hand, such as ﬁngers, are missing. These artifacts heavily complicate the traditional segmentation methods discussed in Section I. Given the size of the dataset, we chose to split the data into a train/validation/test set, as opposed to performing a k-fold cross-validation, to reduce computational demands. First, the data is split by gender. Next, an age-stratiﬁed split is generated for each gender based on the bone age estimated by trained physicians. Table I provides an overview of the distribution of the patients in the various sub-datasets.