Environmental Modelling and Software 139 (2021) 105002 Available online 25 February 2021 1364-8152/© 2021 Elsevier Ltd. All rights reserved. Precipitation reconstruction from climate-sensitive lithologies using Bayesian machine learning Rohitash Chandra d, b, * , Sally Cripps b, d , Nathaniel Butterworth c , R. Dietmar Muller a a EarthByte Group, School of Geosciences, University of Sydney, NSW, 2006, Sydney, Australia b Data Analytics for Resources and Environments, Australian Research Council - Industrial Transformation Training Centre, Australia c Sydney Informatics Hub, University of Sydney, NSW, 2006, Sydney, Australia d School of Mathematics and Statistics, University of Sydney, NSW, 2006, Sydney, Australia A R T I C L E INFO Keywords: Paleo-climate Gaussian process Bayesian methods Forecasting Precipitation ABSTRACT Although global circulation models (GCMs) have been used for the reconstruction of precipitation for selected geological time slices, there is a lack of a coherent set of precipitation models for the Mesozoic-Cenozoic period (the last 250 million years). There has been dramatic climate change during this time period capturing a su- percontinent hothouse climate, and continental breakup and dispersal associated with successive greenhouse and ice-house climate periods. We present an approach that links climate-sensitive sedimentary deposits such as coal, evaporites and glacial deposits to a global plate model, reconstructed paleo-elevation maps and high-resolution GCMs via Bayesian machine learning. We model the joint distribution of climate-sensitive sediments and annual precipitation through geological time, and use the dependency between sediments and precipitation to improve the models predictive accuracy. Our approach provides a set of 13 data-driven global paleo-precipitation maps between 14 and 249 Ma, capturing major changes in long-term annual rainfall patterns as a function of plate tectonics, paleo-elevation and climate change at a low computational cost. 1. Introduction Palaeoclimatology refers to the study or reconstruction of ancient climates (Crowley and North, 1991; Bradley, 1999), often linked to the goal of understanding the current climate and its potential future tra- jectories (Hansen and Sato, 2012). The two primary variables used to defne climate are temperature and precipitation. We focus on recon- structing the long-term history of precipitation, which is refected in the geological record of climate-sensitive sedimentary deposits (Boucot et al., 2013a). Such a reconstruction involves several challenges. First, observational data constraining precipitation over geological time spans covering millions of years are sparse, both temporally and spatially (Boucot et al., 2013a). Second, the information from observational data must be fused together with knowledge of the geophysical processes in a logically consistent statistical framework or model (Birchfeld et al., 1981; Crowley, 1988; Glancy et al., 1993; Patzkowsky et al., 1991; McGehee and Lehman, 2012; Stocker et al., 1992; Phipps et al., 2013; Ritz et al., 2011; Wang and Mysak, 2000; Contreras et al., 2019; Arıkan, 2015; Sellwood and Valdes, 2006). Third, the data is often noisy and becomes increasingly uncertain, the further we go back in time (Mann and Rutherford, 2002; Steiger et al., 2014; McIntyre and McKitrick, 2009). These characteristics increase levels of uncertainty about ancient climates, which must be accurately quantifed for meaningful inference using the data and the model parameters. The evolution of precipitation through geological time can be modelled using fully-coupled global circulation models (GCMs) (e.g. (Herold et al., 2011; Lunt et al., 2017; Baatsen et al., 2020)). However, a single model of this type for an individual geological time slice, typically takes several months to run on a high-performance computer. This limits the usefulness of this approach to develop models over geologic time. In addition, the preparation of initial and boundary conditions for such models is time-consuming. Only a limited number of geological time slices has been explored given the enormous computational resources for construction of a single model using GCMs. Some models focused on past hothouse climates, such as those in parts of the Miocene (Herold et al., 2011) and Eocene (Baatsen et al., 2020) periods. A major chal- lenge in this area of research is developing improved methods to quantify climate model uncertainty. Combining climate proxies with Bayesian inference is seen as having great potential for assessing un- certainties and directly linking climate proxies with climate simulations * Corresponding author. School of Mathematics and Statistics, University of Sydney, NSW, 2006, Sydney, Australia. E-mail address: rohitash.chandra@unsw.edu.au (R. Chandra). Contents lists available at ScienceDirect Environmental Modelling and Software journal homepage: http://www.elsevier.com/locate/envsoft https://doi.org/10.1016/j.envsoft.2021.105002 Accepted 16 February 2021