https://doi.org/10.1007/s10661-019-7428-x
Applying machine learning to forecast daily Ambrosia
pollen using environmental and NEXRAD parameters
Gebreab K. Zewdie · Xun Liu · Daji Wu ·
David J. Lary · Estelle Levetin
Received: 4 January 2017 / Accepted: 20 March 2019
© Springer Nature Switzerland AG 2019
Abstract Approximately 50 million Americans have
allergic diseases. Airborne plant pollen is a significant
trigger for several of these allergic diseases. Ambrosia
(ragweed) is known for its abundant production of
pollen and its potent allergic effect in North Amer-
ica. Hence, estimating and predicting the daily atmo-
spheric concentration of pollen (ragweed pollen in
particular) is useful for both people with allergies and
for the health professionals who care for them. In this
study, we show that a suite of variables including mete-
orological and land surface parameters, as well
as next-generation radar (NEXRAD) measurements
together with machine learning can be used to esti-
This article is part of the Topical Collection on Geospatial
Technology in Environmental Health Applications
G. K. Zewdie () · X. Liu · D. Wu · D. J. Lary
William B. Hanson Center for Space Sciences,
The University of Texas at Dallas, Richardson, TX, USA
e-mail: gebreab.zewdie@utdallas.edu
X. Liu
e-mail: xun.liu@utdallas.edu
D. Wu
e-mail: daji.wu@utdallas.edu
D. J. Lary
e-mail: david.lary@utdallas.edu
E. Levetin
The University of Tulsa, Tulsa, OK 74104, USA
e-mail: estelle-levetin@utulsa.edu
mate successfully the daily pollen concentration.
The supervised machine learning approaches we used
included random forests, neural networks, and support
vector machines. The performance of the training is
independently validated using 10% of the data par-
titioned using the holdout cross-validation method
from the original dataset. The random forests
(R=0.61, R
2
=0.37), support vector machines (R=0.51,
R
2
=0.26), and neural networks (R=0.46, R
2
=0.21)
effectively predicted the daily Ambrosia pollen,
where the correlation coefficient (R) and R-squared
(R
2
) values are given in brackets. Three inde-
pendent approaches—the random forests, correla-
tion coefficients, and interaction information—were
employed to rank the relative importance of the avail-
able predictors.
Keywords Pollen · Machine learning ·
Environmental parameters · NEXRAD measurements
Introduction
Pollen is known to be a trigger for allergic diseases,
e.g., asthma, hay fever, and allergic rhinitis (Oswalt
and Marshall 2008; Howard and Levetin 2014). It
is interesting that a variety of non-respiratory issues
such as strokes (Low et al. 2006; Matheson et al.
2008), and surprisingly, even suicide and attempted
suicide (Postolache et al. 2005; Stickley et al. 2017)
Environ Monit Assess (2019) 191(Suppl 2): 261