  Citation: Gaumard, R.; Dragún, D.; Pedroza-Montero, J.N.; Alonso, B.; Guesmi, H.; Malkin Ondík, I.; Mineva, T. Regression Machine Learning Models Used to Predict DFT-Computed NMR Parameters of Zeolites. Computation 2022, 10, 74. https://doi.org/10.3390/ computation10050074 Academic Editor: Henry Chermette Received: 30 March 2022 Accepted: 10 May 2022 Published: 13 May 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). computation Article Regression Machine Learning Models Used to Predict DFT-Computed NMR Parameters of Zeolites Robin Gaumard 1 , Dominik Dragún 2 , Jesús N. Pedroza-Montero 1 , Bruno Alonso 1 , Hazar Guesmi 1 , Irina Malkin Ondík 2,3 and Tzonka Mineva 1, * 1 ICGM, CNRS, ENSCM, Universite de Montpellier, 34296 Montpellier, France; robin.gaumard@enscm.fr (R.G.); jesus-nain.pedroza-montero@umontpellier.fr (J.N.P.-M.); bruno.alonso@enscm.fr (B.A.); hazar.guesmi@enscm.fr (H.G.) 2 FIIT STU in Bratislava, Ilkoviˇ cova 2, 84216 Bratislava, Slovakia; domco.dragun@gmail.com (D.D.); malkin.ondik@gmail.com (I.M.O.) 3 MicroStep-MIS, Spol. S.R.O., ˇ Cavojského 1, 84104 Bratislava, Slovakia * Correspondence: tzonka.mineva@enscm.fr Abstract: Machine learning approaches can drastically decrease the computational time for the predictions of spectroscopic properties in materials, while preserving the quality of the computational approaches. We studied the performance of kernel-ridge regression (KRR) and gradient boosting regressor (GBR) models trained on the isotropic shielding values, computed with density-functional theory (DFT), in a series of different known zeolites containing out-of-frame metal cations or fluorine anion and organic structure-directing cations. The smooth overlap of atomic position descriptors were computed from the DFT-optimised Cartesian coordinates of each atoms in the zeolite crystal cells. The use of these descriptors as inputs in both machine learning regression methods led to the prediction of the DFT isotropic shielding values with mean errors within 0.6 ppm. The results showed that the GBR model scales better than the KRR model. Keywords: NMR; machine learning; zeolites 1. Introduction Machine learning (ML) coupled with density functional theory (DFT) calculations has been rapidly emerging for predictions of nuclear magnetic resonance (NMR) isotropic shielding values [19]. The role of the experimental NMR investigations to recognise the lo- cal atomic environment in chemical and biological systems has been established for decades. Theoretical DFT calculations, using either the gauge-invariant atomic orbital (GIAO) or gauge invariant-projector augmented wave (GIPAW), have been widely employed to im- prove the NMR signal assignments and/or identify the local structural environment and molecular interactions of the targeted nucleus [10,11]. The interest in the last few years in developing and applying ML models for the prediction of NMR parameters thus originates in the importance of the rapid achievement of accurate theoretical NMR parameters. Hitherto, several ML models [12] have been built and applied for predicting NMR isotropic shielding (σ iso ) or, respectively, the chemical shift (δ = σ re f - σ iso ) of 1 H, 13 C, 13 O, and 13 N nuclei in small organic, aromatic molecules or molecular crystals [2,6,1320]. These ML models comprise deep neural networks (DNNs) [15], convolutional neural net- works (CNNs) [16], the IMPRESSION model based on kernel-ridge regression (KRR) [6,19,20], linear-ridge regression [2], gradient boosting regression (GBR) [21,22], graph neural networks (GNNs) [23,24], and the Δ-ML method [7]. Chemical shifts of proteins have been predicted using random forest regression (RFR) [13,14,17,18]. Despite the strong decrease of the compu- tational time to train the model and predict the NMR parameters, in comparison to the GIAO and GIPAW calculations, most of the ML models yielded somewhat less accurate results in comparison to the experimental data than the DFT σ iso with PBE exchange–correlation Computation 2022, 10, 74. https://doi.org/10.3390/computation10050074 https://www.mdpi.com/journal/computation