Original Article
PRE-PROCESSING TECHNIQUES FOR INTEGRATED ADVERSE DRUG REACTION DATASETS
SHIKSHA DUBEY
*
, SHIRSHENDU MAITRA
Department of Computer Application, Thakur Institute of Management, Career Development and Research (TIMSCDR), Mumbai-400101,
India
*
Corresponding author: Shiksha Dubey;
*
Email: shiksha.dubey@timscdrmumbai.in
Received: 12 Apr 2025, Revised and Accepted: 17 Jun 2025
ABSTRACT
Objective: To integrate and preprocess datasets from the FDA adverse event reporting system (FAERS), side effect resource (SIDER), DrugBank,
and PubChem to extract meaningful insights into drug interactions, adverse events, and molecular properties, thereby supporting drug discovery
and pharmacovigilance.
Methods: The study implements a preprocessing pipeline that includes data cleaning, normalization, and harmonization to ensure consistency
across the diverse datasets. Standardization of drug nomenclature and handling of missing or inconsistent information are emphasized. The
integrated data is then subjected to exploratory data analysis and advanced visualization techniques to uncover patterns and correlations within the
data.
Results: The integration and preprocessing of the datasets improved the consistency and quality of the drug-related data. Exploratory analysis
revealed patterns and potential associations among drugs, adverse events, and molecular features. Visualization tools effectively conveyed complex
relationships and significant trends, enhancing interpretability.
Conclusion: The study successfully demonstrates that integrating and preprocessing multiple drug-related datasets improves data quality and
facilitates comprehensive analysis. The resulting resource supports better-informed decision-making in drug development and pharmacovigilance
by enabling a deeper understanding of drug interactions and safety profiles.
Keywords: Adverse drug reactions (ADRs), Drugs, Pharmacology
© 2025 The Authors. Published by Innovare Academic Sciences Pvt Ltd. This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/)
DOI: https://dx.doi.org/10.22159/ijpps.2025v17i8.54565 Journal homepage: https://innovareacademics.in/journals/index.php/ijpps
INTRODUCTION
The safety of drugs is the major goal of the healthcare and
pharmaceutical industry today [1]. The correct dosage and proper
care are important for improving health and enhancing human well-
being. In the current scenario, all the drugs developed have
associated risks. Only drugs with a greater therapeutic effect than
the associated risk are prescribed as medicines. ADRs are drug-
related, unwanted, undesired effects on human health. The Adverse
Drug Reactions are categorized to be part of Adverse Drug Events,
which is again included in the subcategory of medication errors as
shown in fig. 1.
Fig. 1: Framework to categorize adverse drug reaction
ADR, as per the International Council for Harmonization (ICH)
definition, refers to "all harmful and unintended responses to a
medicinal product at any dose [2, 3]". This definition implies a
potential causal link between the medicinal product and the adverse
reaction. Since most drugs undergo clinical testing before market
release [4], ADRs primarily manifest during the post-marketing
phase of their lifecycle. These adverse reactions constitute a major
cause of human mortality and morbidity. Studies indicate that ADRs
contribute significantly to hospitalization and rank as the fifth
leading reason for mortality in hospitals [5]. Increased healthcare
costs and prolonged hospital stays account for the severity of ADRs.
Numerous factors contribute to the occurrence of ADRs in humans,
categorized as related to the patient, drug, and the environment.
Patient-related factors include age and gender, crucial in assessing
individual susceptibility to ADRs. Drug-related factors encompass
dosage and interactions with other medications. Social factors like
smoking and alcohol consumption also influence ADR development,
underscoring their significance in enhancing patient safety and
healthcare analysis [6]. The history of ADR research dates back to
notable incidents like the "Sulphanilamide Disaster" in the USA in
1937, which led to over 100 deaths from renal failure, and the
"Thalidomide Disaster “ Germany 1957, causing severe birth defects
like phocomelia [7, 8]. These events prompted the establishment of
ADR monitoring centers worldwide to monitor and report adverse
drug reactions effectively.
Literature review
Various techniques are currently employed to preprocess
unstructured adverse drug reaction (ADR) data for research
purposes. Friedrich et al. [9] have extensively detailed normalization
techniques using dictionaries. Similarly, Louis et al. [10] encoded
ADRs into standardized naming conventions to enhance the
performance of machine learning models. Text mining and natural
language processing techniques are applied to convert unstructured
data into a machine-readable format suitable for training and
prediction. Over the past two decades, numerous datasets have been
created to document incidents of adverse drug reactions. In this time
span, countries have developed pharmacovigilance centers
responsible for collecting ADR reports mainly from medical
practitioners and healthcare professionals. For post-marketing
surveillance of ADRs, these centers play a critical role.
The diagram fig. 2 provides a brief overview of various sources of
ADR-related data.
International Journal of Pharmacy and Pharmaceutical Sciences
Print ISSN: 2656-0097 | Online ISSN: 0975-1491 Vol 17, Issue 8, 2025