Reconstructing past biomes states using machine learning and
modern pollen assemblages: A case study from Southern Africa
Magdalena K. Sobol
a, *
, Louis Scott
b
, Sarah A. Finkelstein
a
a
Department of Earth Sciences, University of Toronto, 22 Russell St, Toronto, M5S 3B1, Canada
b
Department of Plant Sciences, University of the Free State, PO Box 339, Bloemfontein, 9300, South Africa
article info
Article history:
Received 27 September 2018
Received in revised form
28 February 2019
Accepted 25 March 2019
Available online 4 April 2019
Keywords:
Pollen datasets
Data analysis
Objective classification
Biomes
Vegetation dynamics
Vegetation reconstructions
Late Pleistocene
Holocene
abstract
Fossil pollen assemblages can assist in understanding biome responses to global climate change if there
is reasonable probability that they represent specific biomes or bioregions. In this paper, we introduce a
novel probabilistic presentation of pollen data and biome assignment. We apply a recently developed
pollen-based vegetation classification method utilizing supervised machine learning to Southern Africa
modern pollen assemblages. We present an updated modern pollen dataset from Southern Africa, linking
the sites to previously defined vegetation units and, ultimately, we generate probabilistic classification
for fossil assemblages to reconstruct past vegetation.
The modern pollen dataset (N ¼ 211 sites) represents a long vegetation gradient, from desert to forest
biomes, capturing broad climate gradients ranging from arid to subtropical. We validate two models
using Random Forest algorithm to classify modern vegetation at different spatial resolutions: subcon-
tinental (biomes) and regional (bioregions). When the modern pollen assemblages (N ¼ 164 sites) are
used to predict the vegetation types, the classification models are correct in a number of cases. In our
dataset of 164 sites, the classification model correctly classifies pollen assemblages from savanna (91%
correct), grassland (87%), and coastal forest (82%) vegetation types, while the best results for classifi-
cation of regional vegetation are achieved for sub-humid savanna (95%), dry savanna (95%), coastal forest
(91%), and wet grassland (90%).
We apply the models to a fossil pollen sequence at Wonderkrater in the South African savanna, to
reconstruct subcontinental and regional changes in past vegetation states over the last 60 000 years. The
most probable vegetation state dominating the region since the Late Pleistocene is sub-humid savanna
yet grassland occurred at times associated with high vegetation variability. Within the record, the most
frequent and amplified variability in the inferred vegetation states occurred during the transitional phase
between the Late Pleistocene and the Holocene. The machine learning approach for reconstructing past
vegetation, offers a more complex and nuanced view of past vegetation dynamics and has the potential
to support quantitative proxy-based techniques for palaeoclimatic reconstructions.
© 2019 Published by Elsevier Ltd.
1. Introduction
Modeling complex biomes using multivariate proxy data is
ecologically challenging and computationally expensive. At the
time of its development, the biomization method (Prentice et al.,
1992, 1996) was a revolutionary approach to modeling biomes
from pollen data. The method rests on the assumption that the
functional relationship between form and function of few key
plants may be substituted for biomes and biome modeling. Biomes
are classified using pollen assemblages through plant functional
types (PFTs); to link pollen assemblages to biomes, two binary
matrices assigning pollen assemblages to PFTs, and PTFs to biomes
are multiplied (Prentice et al., 1992). Thus, the biomization method
reduces large complex datasets to a smaller number of represen-
tative plants.
The method, however, relies on few key pollen taxa to represent
biomes. Methods considering whole pollen datasets may provide
additional nuances particularly applicable to periods of high cli-
matic variability (Williams et al., 2004). Moreover, PFTs created for
one region cannot be easily applied to another region. Thus, new
PFTs must be created for new contexts. Methodological biases can
* Corresponding author.
E-mail address: magdalena.sobol@mail.utoronto.ca (M.K. Sobol).
Contents lists available at ScienceDirect
Quaternary Science Reviews
journal homepage: www.elsevier.com/locate/quascirev
https://doi.org/10.1016/j.quascirev.2019.03.027
0277-3791/© 2019 Published by Elsevier Ltd.
Quaternary Science Reviews 212 (2019) 1e17