metabolites
H
OH
OH
Article
MetaFetcheR: An R Package for Complete Mapping of
Small-Compound Data
Sara A. Yones
1,
* , Rajmund Csombordi
1
, Jan Komorowski
1,2,3,4
and Klev Diamanti
1,5,
*
Citation: Yones, S.A.; Csombordi, R.;
Komorowski, J.; Diamanti, K.
MetaFetcheR: An R Package for
Complete Mapping of Small-
Compound Data. Metabolites 2021, 11,
743. https://doi.org/10.3390/
metabo11110743
Academic Editor: Hunter N. B.
Moseley
Received: 30 August 2021
Accepted: 27 October 2021
Published: 28 October 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Department of Cellular and Molecular Biology, Uppsala University, 751 24 Uppsala, Sweden;
rajmund.csombordi@gmail.com (R.C.); jan.komorowski@icm.uu.se (J.K.)
2
Institute of Computer Science, Polish Academy of Sciences, 01-248 Warsaw, Poland
3
Washington National Primate Research Center, Seattle, WA 98121, USA
4
Swedish Collegium for Advanced Study, 752 38 Uppsala, Sweden
5
Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
* Correspondence: sara.younes@icm.uu.se (S.A.Y.); klev.diamanti@igp.uu.se (K.D.);
Tel.: +46-76-592-2512 (S.A.Y.); +46-73-926-7648 (K.D.)
Abstract: Small-compound databases contain a large amount of information for metabolites and
metabolic pathways. However, the plethora of such databases and the redundancy of their infor-
mation lead to major issues with analysis and standardization. A lack of preventive establishment
of means of data access at the infant stages of a project might lead to mislabelled compounds, re-
duced statistical power, and large delays in delivery of results. We developed MetaFetcheR, an
open-source R package that links metabolite data from several small-compound databases, resolves
inconsistencies, and covers a variety of use-cases of data fetching. We showed that the performance of
MetaFetcheR was superior to existing approaches and databases by benchmarking the performance
of the algorithm in three independent case studies based on two published datasets.
Keywords: small-compound databases; metabolomics; metabolites; queue-based algorithm
1. Introduction
Metabolomics allows the study of small-molecule substrates and compounds that are
involved in metabolic processes. A small compound (<1500 Da) is a low-molecular-weight
organic compound that is involved in or may regulate biological processes. Examples
of small compounds include various sugars, lipids, and amino acids. Various complex
diseases have been strongly linked to metabolic disorders, such as type 2 diabetes and can-
cer, making metabolomics a highly relevant field for single- and multi-omics studies [1–3].
Pathway enrichment analysis is a widespread analysis approach for metabolomics that
requires metabolites to map a predefined set of unique identifiers [4]. In this setup there
are several issues that arise when accessing, pre-processing, and analysing metabolite data.
For instance, the overlapping and non-overlapping information for metabolites is scattered
across several small-compound databases, leading to major analysis and standardization
issues [5–7]. Additional challenges occur with databases that deliver data, which contain
multiple entries for one metabolite or incomplete data. Finally, foreign reference identi-
fiers may be missing, making it difficult, sometimes impossible, to find the link between
two records of the same metabolite in different databases, while in other cases, the small
fraction of reference identifiers that are present might lead to incorrect compounds. The
aforementioned issues delay the delivery of results and more importantly, might lead to
inconsistent or biased results.
Xia and colleagues developed MetaboAnalyst, which is a versatile computational tool
for metabolomics. This tool contains a module aimed at mapping names to identifiers
of metabolites from the human metabolome database (HMDB), the chemical entities of
biological interest (ChEBI), the Kyoto encyclopedia of genes and genomes (KEGG), Pub-
Chem, and METLIN [5,7–11]. However, the lack of a shared nomenclature for metabolite
Metabolites 2021, 11, 743. https://doi.org/10.3390/metabo11110743 https://www.mdpi.com/journal/metabolites