metabolites H OH OH Article MetaFetcheR: An R Package for Complete Mapping of Small-Compound Data Sara A. Yones 1, * , Rajmund Csombordi 1 , Jan Komorowski 1,2,3,4 and Klev Diamanti 1,5, *   Citation: Yones, S.A.; Csombordi, R.; Komorowski, J.; Diamanti, K. MetaFetcheR: An R Package for Complete Mapping of Small- Compound Data. Metabolites 2021, 11, 743. https://doi.org/10.3390/ metabo11110743 Academic Editor: Hunter N. B. Moseley Received: 30 August 2021 Accepted: 27 October 2021 Published: 28 October 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil- iations. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1 Department of Cellular and Molecular Biology, Uppsala University, 751 24 Uppsala, Sweden; rajmund.csombordi@gmail.com (R.C.); jan.komorowski@icm.uu.se (J.K.) 2 Institute of Computer Science, Polish Academy of Sciences, 01-248 Warsaw, Poland 3 Washington National Primate Research Center, Seattle, WA 98121, USA 4 Swedish Collegium for Advanced Study, 752 38 Uppsala, Sweden 5 Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden * Correspondence: sara.younes@icm.uu.se (S.A.Y.); klev.diamanti@igp.uu.se (K.D.); Tel.: +46-76-592-2512 (S.A.Y.); +46-73-926-7648 (K.D.) Abstract: Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their infor- mation lead to major issues with analysis and standardization. A lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, re- duced statistical power, and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies, and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets. Keywords: small-compound databases; metabolomics; metabolites; queue-based algorithm 1. Introduction Metabolomics allows the study of small-molecule substrates and compounds that are involved in metabolic processes. A small compound (<1500 Da) is a low-molecular-weight organic compound that is involved in or may regulate biological processes. Examples of small compounds include various sugars, lipids, and amino acids. Various complex diseases have been strongly linked to metabolic disorders, such as type 2 diabetes and can- cer, making metabolomics a highly relevant field for single- and multi-omics studies [13]. Pathway enrichment analysis is a widespread analysis approach for metabolomics that requires metabolites to map a predefined set of unique identifiers [4]. In this setup there are several issues that arise when accessing, pre-processing, and analysing metabolite data. For instance, the overlapping and non-overlapping information for metabolites is scattered across several small-compound databases, leading to major analysis and standardization issues [57]. Additional challenges occur with databases that deliver data, which contain multiple entries for one metabolite or incomplete data. Finally, foreign reference identi- fiers may be missing, making it difficult, sometimes impossible, to find the link between two records of the same metabolite in different databases, while in other cases, the small fraction of reference identifiers that are present might lead to incorrect compounds. The aforementioned issues delay the delivery of results and more importantly, might lead to inconsistent or biased results. Xia and colleagues developed MetaboAnalyst, which is a versatile computational tool for metabolomics. This tool contains a module aimed at mapping names to identifiers of metabolites from the human metabolome database (HMDB), the chemical entities of biological interest (ChEBI), the Kyoto encyclopedia of genes and genomes (KEGG), Pub- Chem, and METLIN [5,711]. However, the lack of a shared nomenclature for metabolite Metabolites 2021, 11, 743. https://doi.org/10.3390/metabo11110743 https://www.mdpi.com/journal/metabolites