Citation: O’Connor, L.M.; O’Connor, B.A.; Zeng, J.; Lo, C.H. Data Mining of Microarray Datasets in Translational Neuroscience. Brain Sci. 2023, 13, 1318. https://doi.org/10.3390/ brainsci13091318 Academic Editors: Rodrigo Pena, Paulo R. Protachevicz and Ricardo F. Ferreira Received: 25 July 2023 Revised: 4 September 2023 Accepted: 10 September 2023 Published: 14 September 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). brain sciences Review Data Mining of Microarray Datasets in Translational Neuroscience Lance M. O’Connor 1 , Blake A. O’Connor 2 , Jialiu Zeng 3 and Chih Hung Lo 3, * 1 College of Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA; ocon0436@umn.edu 2 School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA; baoconnor2@wisc.edu 3 Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore; jialiu.zeng@ntu.edu.sg * Correspondence: chihhung.lo@ntu.edu.sg Abstract: Data mining involves the computational analysis of a plethora of publicly available datasets to generate new hypotheses that can be further validated by experiments for the improved under- standing of the pathogenesis of neurodegenerative diseases. Although the number of sequencing datasets is on the rise, microarray analysis conducted on diverse biological samples represent a large collection of datasets with multiple web-based programs that enable efﬁcient and convenient data analysis. In this review, we ﬁrst discuss the selection of biological samples associated with neurological disorders, and the possibility of a combination of datasets, from various types of samples, to conduct an integrated analysis in order to achieve a holistic understanding of the alterations in the examined biological system. We then summarize key approaches and studies that have made use of the data mining of microarray datasets to obtain insights into translational neuroscience applications, including biomarker discovery, therapeutic development, and the elucidation of the pathogenic mech- anisms of neurodegenerative diseases. We further discuss the gap to be bridged between microarray and sequencing studies to improve the utilization and combination of different types of datasets, together with experimental validation, for more comprehensive analyses. We conclude by providing future perspectives on integrating multi-omics, to advance precision phenotyping and personalized medicine for neurodegenerative diseases. Keywords: microarray analysis; biological samples; messenger RNA (mRNA); microRNA (miRNA); circular RNA (circRNA); long non-coding RNA (lncRNA); multi-omics integration; translational neuroscience; biomarker discovery; therapeutic development 1. Introduction Over the past few decades, methods for quantifying the transcriptome have developed and expanded from microarray gene expression and quantitative polymerase chain reac- tion [1,2] to bulk RNA-seq and single-cell or single-nucleus RNA sequencing (sc/snRNA- seq) [3]. RNA-seq techniques have been at the forefront of studies aimed at understanding the heterogeneity of neurological diseases, including Alzheimer’s disease (AD), Parkinson’s disease (PD), and multiple sclerosis (MS) [3,4]. It also has the unique ability of being able to detect novel sequences and splice variants [3,5]. However, RNA-seq methods are generally more labor intensive in data analysis and not as cost efﬁcient in terms of data storage, and they may possess transcript length bias, which is currently mediated by long-read sequencing [5]. Although microarray gene expression analysis is limited to transcripts that are already established for the model organism being analyzed, it is able to detect highly varied genes [6]. Despite the technical differences, results from microarray and RNA-seq analyses have been shown to be highly consistent with each other [7]. In the context of data mining, microarray analysis is still widely adopted due to its low cost, high efﬁciency, limited bias [8], greater statistical power [9], and vast number of public neuroscience datasets available for data mining [3,10]. Brain Sci. 2023, 13, 1318. https://doi.org/10.3390/brainsci13091318 https://www.mdpi.com/journal/brainsci