Citation: Gaumard, R.; Dragún, D.;
Pedroza-Montero, J.N.; Alonso, B.;
Guesmi, H.; Malkin Ondík, I.;
Mineva, T. Regression Machine
Learning Models Used to Predict
DFT-Computed NMR Parameters of
Zeolites. Computation 2022, 10, 74.
https://doi.org/10.3390/
computation10050074
Academic Editor: Henry Chermette
Received: 30 March 2022
Accepted: 10 May 2022
Published: 13 May 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
computation
Article
Regression Machine Learning Models Used to Predict
DFT-Computed NMR Parameters of Zeolites
Robin Gaumard
1
, Dominik Dragún
2
, Jesús N. Pedroza-Montero
1
, Bruno Alonso
1
, Hazar Guesmi
1
,
Irina Malkin Ondík
2,3
and Tzonka Mineva
1,
*
1
ICGM, CNRS, ENSCM, Universite de Montpellier, 34296 Montpellier, France; robin.gaumard@enscm.fr (R.G.);
jesus-nain.pedroza-montero@umontpellier.fr (J.N.P.-M.); bruno.alonso@enscm.fr (B.A.);
hazar.guesmi@enscm.fr (H.G.)
2
FIIT STU in Bratislava, Ilkoviˇ cova 2, 84216 Bratislava, Slovakia; domco.dragun@gmail.com (D.D.);
malkin.ondik@gmail.com (I.M.O.)
3
MicroStep-MIS, Spol. S.R.O.,
ˇ
Cavojského 1, 84104 Bratislava, Slovakia
* Correspondence: tzonka.mineva@enscm.fr
Abstract: Machine learning approaches can drastically decrease the computational time for the
predictions of spectroscopic properties in materials, while preserving the quality of the computational
approaches. We studied the performance of kernel-ridge regression (KRR) and gradient boosting
regressor (GBR) models trained on the isotropic shielding values, computed with density-functional
theory (DFT), in a series of different known zeolites containing out-of-frame metal cations or fluorine
anion and organic structure-directing cations. The smooth overlap of atomic position descriptors
were computed from the DFT-optimised Cartesian coordinates of each atoms in the zeolite crystal
cells. The use of these descriptors as inputs in both machine learning regression methods led to
the prediction of the DFT isotropic shielding values with mean errors within 0.6 ppm. The results
showed that the GBR model scales better than the KRR model.
Keywords: NMR; machine learning; zeolites
1. Introduction
Machine learning (ML) coupled with density functional theory (DFT) calculations
has been rapidly emerging for predictions of nuclear magnetic resonance (NMR) isotropic
shielding values [1–9]. The role of the experimental NMR investigations to recognise the lo-
cal atomic environment in chemical and biological systems has been established for decades.
Theoretical DFT calculations, using either the gauge-invariant atomic orbital (GIAO) or
gauge invariant-projector augmented wave (GIPAW), have been widely employed to im-
prove the NMR signal assignments and/or identify the local structural environment and
molecular interactions of the targeted nucleus [10,11]. The interest in the last few years in
developing and applying ML models for the prediction of NMR parameters thus originates
in the importance of the rapid achievement of accurate theoretical NMR parameters.
Hitherto, several ML models [12] have been built and applied for predicting NMR
isotropic shielding (σ
iso
) or, respectively, the chemical shift (δ = σ
re f
- σ
iso
) of
1
H,
13
C,
13
O, and
13
N nuclei in small organic, aromatic molecules or molecular crystals [2,6,13–20].
These ML models comprise deep neural networks (DNNs) [15], convolutional neural net-
works (CNNs) [16], the IMPRESSION model based on kernel-ridge regression (KRR) [6,19,20],
linear-ridge regression [2], gradient boosting regression (GBR) [21,22], graph neural networks
(GNNs) [23,24], and the Δ-ML method [7]. Chemical shifts of proteins have been predicted
using random forest regression (RFR) [13,14,17,18]. Despite the strong decrease of the compu-
tational time to train the model and predict the NMR parameters, in comparison to the GIAO
and GIPAW calculations, most of the ML models yielded somewhat less accurate results
in comparison to the experimental data than the DFT σ
iso
with PBE exchange–correlation
Computation 2022, 10, 74. https://doi.org/10.3390/computation10050074 https://www.mdpi.com/journal/computation