Cécile Chauvel, PhD, is a researcher in biostatistics in the Data Management and Analysis unit at Bioaster, Lyon, France.
Alexei Novoloaca is a PhD student in biostatistics in the Epigenetics Group at the International Agency for Research on Cancer, World Health Organization,
Lyon, France.
Pierre Veyre is a computer scientist in the Data Management and Analysis unit at Bioaster, Lyon, France.
Frédéric Reynier, PhD, is the head of the Genomics and Transcriptomics unit at Bioaster, Lyon, France.
Jérémie Becker, DPhil, is a researcher in biostatistics in the Genomics and Transcriptomics unit at Bioaster, Lyon, France. BIOASTER is a technological
research institute in microbiology that aims to develop new innovative and high-value technology solutions through collaborative projects. Its main
interest lies in tackling antimicrobial resistance, developing new diagnostics, improving vaccines’ safety and efficacy and understanding the involvement
of microbiome in human and animal health.
Submitted: 26 September 2018; Received (in revised form): 12 January 2019
© The authors 2019. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
541
Briefings in Bioinformatics, 21(2), 2020, 541–552
doi: 10.1093/bib/bbz015
Advance Access Publication Date: 14 February 2019
Review article
Evaluation of integrative clustering methods for the
analysis of multi-omics data
Cécile Chauvel
∗
, Alexei Novoloaca
∗
, Pierre Veyre, Frédéric Reynier and
Jérémie Becker
Corresponding author: Jérémie Becker, BIOASTER Research Institute, 40 avenue Tony Garnier, 69007 Lyon, France. Tel.: +33 4 69 85 19 21;
Fax: +33 4 72 70 48 2; E-mail: jeremie.becker@bioaster.org
∗
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Abstract
Recent advances in sequencing, mass spectrometry and cytometry technologies have enabled researchers to collect
large-scale omics data from the same set of biological samples. The joint analysis of multiple omics offers the opportunity
to uncover coordinated cellular processes acting across different omic layers. In this work, we present a thorough
comparison of a selection of recent integrative clustering approaches, including Bayesian (BCC and MDI) and matrix
factorization approaches (iCluster, moCluster, JIVE and iNMF). Based on simulations, the methods were evaluated on their
sensitivity and their ability to recover both the correct number of clusters and the simulated clustering at the common and
data-specific levels. Standard non-integrative approaches were also included to quantify the added value of integrative
methods. For most matrix factorization methods and one Bayesian approach (BCC), the shared and specific structures were
successfully recovered with high and moderate accuracy, respectively. An opposite behavior was observed on
non-integrative approaches, i.e. high performances on specific structures only. Finally, we applied the methods on the
Cancer Genome Atlas breast cancer data set to check whether results based on experimental data were consistent with
those obtained in the simulations.
Key words: benchmark; clustering; data integration; multi-omics; unsupervised analysis
Introduction
The accumulation of large molecular data sets has fueled
the development of translational bioinformatics and systems
biology that share a holistic view on omics data. While the
former aims to link biological to clinical data to improve our
understanding of disease mechanisms, the latter explores the
basic functional properties of living organisms based on the
premise that biological processes build upon the interplay
between macromolecules. Both approaches rely on the idea
that biological mechanisms (and, more generally, phenotypic
traits) can only be fully captured through the study of molecular
interactions among different omics layers.
Multi-omic approaches have received much attention in
recent years for their potential applications in clinics. In
genome-wide association studies for example, the mechanisms
by which the identified loci inf luence phenotypes remain
generally unknown and are likely to be unveiled using functional
Downloaded from https://academic.oup.com/bib/article/21/2/541/5316049 by guest on 06 November 2022