Original Article PRE-PROCESSING TECHNIQUES FOR INTEGRATED ADVERSE DRUG REACTION DATASETS SHIKSHA DUBEY * , SHIRSHENDU MAITRA Department of Computer Application, Thakur Institute of Management, Career Development and Research (TIMSCDR), Mumbai-400101, India * Corresponding author: Shiksha Dubey; * Email: shiksha.dubey@timscdrmumbai.in Received: 12 Apr 2025, Revised and Accepted: 17 Jun 2025 ABSTRACT Objective: To integrate and preprocess datasets from the FDA adverse event reporting system (FAERS), side effect resource (SIDER), DrugBank, and PubChem to extract meaningful insights into drug interactions, adverse events, and molecular properties, thereby supporting drug discovery and pharmacovigilance. Methods: The study implements a preprocessing pipeline that includes data cleaning, normalization, and harmonization to ensure consistency across the diverse datasets. Standardization of drug nomenclature and handling of missing or inconsistent information are emphasized. The integrated data is then subjected to exploratory data analysis and advanced visualization techniques to uncover patterns and correlations within the data. Results: The integration and preprocessing of the datasets improved the consistency and quality of the drug-related data. Exploratory analysis revealed patterns and potential associations among drugs, adverse events, and molecular features. Visualization tools effectively conveyed complex relationships and significant trends, enhancing interpretability. Conclusion: The study successfully demonstrates that integrating and preprocessing multiple drug-related datasets improves data quality and facilitates comprehensive analysis. The resulting resource supports better-informed decision-making in drug development and pharmacovigilance by enabling a deeper understanding of drug interactions and safety profiles. Keywords: Adverse drug reactions (ADRs), Drugs, Pharmacology © 2025 The Authors. Published by Innovare Academic Sciences Pvt Ltd. This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/) DOI: https://dx.doi.org/10.22159/ijpps.2025v17i8.54565 Journal homepage: https://innovareacademics.in/journals/index.php/ijpps INTRODUCTION The safety of drugs is the major goal of the healthcare and pharmaceutical industry today [1]. The correct dosage and proper care are important for improving health and enhancing human well- being. In the current scenario, all the drugs developed have associated risks. Only drugs with a greater therapeutic effect than the associated risk are prescribed as medicines. ADRs are drug- related, unwanted, undesired effects on human health. The Adverse Drug Reactions are categorized to be part of Adverse Drug Events, which is again included in the subcategory of medication errors as shown in fig. 1. Fig. 1: Framework to categorize adverse drug reaction ADR, as per the International Council for Harmonization (ICH) definition, refers to "all harmful and unintended responses to a medicinal product at any dose [2, 3]". This definition implies a potential causal link between the medicinal product and the adverse reaction. Since most drugs undergo clinical testing before market release [4], ADRs primarily manifest during the post-marketing phase of their lifecycle. These adverse reactions constitute a major cause of human mortality and morbidity. Studies indicate that ADRs contribute significantly to hospitalization and rank as the fifth leading reason for mortality in hospitals [5]. Increased healthcare costs and prolonged hospital stays account for the severity of ADRs. Numerous factors contribute to the occurrence of ADRs in humans, categorized as related to the patient, drug, and the environment. Patient-related factors include age and gender, crucial in assessing individual susceptibility to ADRs. Drug-related factors encompass dosage and interactions with other medications. Social factors like smoking and alcohol consumption also influence ADR development, underscoring their significance in enhancing patient safety and healthcare analysis [6]. The history of ADR research dates back to notable incidents like the "Sulphanilamide Disaster" in the USA in 1937, which led to over 100 deaths from renal failure, and the "Thalidomide Disaster “ Germany 1957, causing severe birth defects like phocomelia [7, 8]. These events prompted the establishment of ADR monitoring centers worldwide to monitor and report adverse drug reactions effectively. Literature review Various techniques are currently employed to preprocess unstructured adverse drug reaction (ADR) data for research purposes. Friedrich et al. [9] have extensively detailed normalization techniques using dictionaries. Similarly, Louis et al. [10] encoded ADRs into standardized naming conventions to enhance the performance of machine learning models. Text mining and natural language processing techniques are applied to convert unstructured data into a machine-readable format suitable for training and prediction. Over the past two decades, numerous datasets have been created to document incidents of adverse drug reactions. In this time span, countries have developed pharmacovigilance centers responsible for collecting ADR reports mainly from medical practitioners and healthcare professionals. For post-marketing surveillance of ADRs, these centers play a critical role. The diagram fig. 2 provides a brief overview of various sources of ADR-related data. International Journal of Pharmacy and Pharmaceutical Sciences Print ISSN: 2656-0097 | Online ISSN: 0975-1491 Vol 17, Issue 8, 2025