Citation: O’Connor, L.M.; O’Connor,
B.A.; Zeng, J.; Lo, C.H. Data Mining
of Microarray Datasets in Translational
Neuroscience. Brain Sci. 2023, 13,
1318. https://doi.org/10.3390/
brainsci13091318
Academic Editors: Rodrigo Pena,
Paulo R. Protachevicz and Ricardo
F. Ferreira
Received: 25 July 2023
Revised: 4 September 2023
Accepted: 10 September 2023
Published: 14 September 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
brain
sciences
Review
Data Mining of Microarray Datasets in Translational Neuroscience
Lance M. O’Connor
1
, Blake A. O’Connor
2
, Jialiu Zeng
3
and Chih Hung Lo
3,
*
1
College of Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA; ocon0436@umn.edu
2
School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA; baoconnor2@wisc.edu
3
Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore;
jialiu.zeng@ntu.edu.sg
* Correspondence: chihhung.lo@ntu.edu.sg
Abstract: Data mining involves the computational analysis of a plethora of publicly available datasets
to generate new hypotheses that can be further validated by experiments for the improved under-
standing of the pathogenesis of neurodegenerative diseases. Although the number of sequencing
datasets is on the rise, microarray analysis conducted on diverse biological samples represent a
large collection of datasets with multiple web-based programs that enable efficient and convenient
data analysis. In this review, we first discuss the selection of biological samples associated with
neurological disorders, and the possibility of a combination of datasets, from various types of samples,
to conduct an integrated analysis in order to achieve a holistic understanding of the alterations in the
examined biological system. We then summarize key approaches and studies that have made use of
the data mining of microarray datasets to obtain insights into translational neuroscience applications,
including biomarker discovery, therapeutic development, and the elucidation of the pathogenic mech-
anisms of neurodegenerative diseases. We further discuss the gap to be bridged between microarray
and sequencing studies to improve the utilization and combination of different types of datasets,
together with experimental validation, for more comprehensive analyses. We conclude by providing
future perspectives on integrating multi-omics, to advance precision phenotyping and personalized
medicine for neurodegenerative diseases.
Keywords: microarray analysis; biological samples; messenger RNA (mRNA); microRNA (miRNA);
circular RNA (circRNA); long non-coding RNA (lncRNA); multi-omics integration; translational
neuroscience; biomarker discovery; therapeutic development
1. Introduction
Over the past few decades, methods for quantifying the transcriptome have developed
and expanded from microarray gene expression and quantitative polymerase chain reac-
tion [1,2] to bulk RNA-seq and single-cell or single-nucleus RNA sequencing (sc/snRNA-
seq) [3]. RNA-seq techniques have been at the forefront of studies aimed at understanding
the heterogeneity of neurological diseases, including Alzheimer’s disease (AD), Parkinson’s
disease (PD), and multiple sclerosis (MS) [3,4]. It also has the unique ability of being able to
detect novel sequences and splice variants [3,5]. However, RNA-seq methods are generally
more labor intensive in data analysis and not as cost efficient in terms of data storage,
and they may possess transcript length bias, which is currently mediated by long-read
sequencing [5]. Although microarray gene expression analysis is limited to transcripts
that are already established for the model organism being analyzed, it is able to detect
highly varied genes [6]. Despite the technical differences, results from microarray and
RNA-seq analyses have been shown to be highly consistent with each other [7]. In the
context of data mining, microarray analysis is still widely adopted due to its low cost,
high efficiency, limited bias [8], greater statistical power [9], and vast number of public
neuroscience datasets available for data mining [3,10].
Brain Sci. 2023, 13, 1318. https://doi.org/10.3390/brainsci13091318 https://www.mdpi.com/journal/brainsci