A Predictive Model for Drug-Drug Interaction Using a Similarity Measure Abirami Ariyur Mahadevan Anagha Vishnuvajjala Naman Dosi Shrisha Rao Abstract—Drug-drug interaction causes potential impact on patients when a second drug is administered during the duration of action of the first. It may result in the delay or decrease in the absorption of rate of drugs or enhance their absorption. This also in turn may affect the action of drugs or induce adverse effects on patients. There exists a need to study the drug-drug interactions, and their potential effects on the human system, including for drugs not yet approved. This paper proposes using eight features (substructure, targets, transporters, enzymes, pathways, indications, side-effect and off-side-effect, obtained from five different databases - PubChem, Drugbank, KEGG, SIDER, Offsides) and a similarity-based ensemble prediction model to identify the potential drug-drug interactions. The proposed ensemble model uses the Jaccard’s coefficient method for identifying similarity measures between drugs. This similarity indices are given to a neighbor recommender method and random walk method for the base prediction of drug-drug interaction. This predictive model is improved by an ensemble model by using a genetic algorithm for weight calculation, and logistic regression for classification. The empirical results show that the ensemble model yields >90% accuracy while predicting the drug- drug interactions. Index Terms—Drug-drug interactions, machine learning, sim- ilarity measures, Jaccard’s coefficient, ensemble model, random walk method, neighbor recommender method I. I NTRODUCTION Drugs are constantly being sought to fight diseases. How- ever, drugs have serious downsides, in terms of side-effects as well as interactions with other drugs. Drug-drug interactions (DDIs) are very common. Some are beneficial to patients, like antidotes administered after overdoses, and drugs used to combat undesirable side-effects of others used in treatment of serious diseases. However, some are potentially harmful and have to be identified at an early stage. Some interactions may have low risk and may be of little clinical significance. It can take years to clinically check DDIs for every pair of drugs. Sometimes DDIs may not get detected in clinical trials. Moreover, it may take many years to check DDIs of all known drugs with a newly discovered one before it is introduced to the market. Hence, there exists the need for efficient DDI- checking with fewer time-consuming, expensive, and risky clinical trials. Drug-drug interaction occurs when two drugs which are co-administered interact and cause an adverse reaction or unexpected side effects. It can be caused through prescribed medicines, overdose and/or by prolonged use of medicines. Between 2009 and 2012, 38.1% of U.S. adults aged 18– 44 used three or more prescription drugs during a 30-day time period [1]. The percentage of drug usage increases substantially with age, becoming 67.2% for ages 45–64, and 89.8% for age 65 years or older respectively. The number of incidents of adverse drug reactions increases exponentially, if a patient takes four or more drugs [2]. However, identifying all possible interactions between all drugs is computationally intractable. DDI detection and remediation requires domain knowledge and the competence to act without undue mental stress to patients and caregivers. Investigations to clinically observe drug interactions are undertaken before marketing, and may assist pharmaceutical companies as well as physicians in gaining confidence about drugs. Some labor-intensive techniques like in-silico methods, in- vitro methods, in-vivo experiments, and clinical trials may identify DDIs, but they are time-consuming [3]. Statistical methods and machine learning methods were developed to detect the adverse reactions of drugs and drug-drug inter- actions by analyzing health reports and records. Researchers have also used drug data from literature and health reports and created public databases in order to facilitate the development of classification and prediction methods [3]. Testing all drugs under all possible conditions is impractical and unethical also, hence machine learning is sought to be used. Among machine learning methods that can be used to predict DDIs, there can be two approaches: similarity-based methods, and classification-based methods. In either case, a model is to be created to analyze how drugs interact with other drugs, and used to predict how a new drug would interact with a known one. Similarity-based models assume that similar drugs interact leading to DDIs. Classification-based models consider DDI prediction as a binary classification task in which they use two kinds of data; drug pairs that cause DDIs and drug pairs that do not cause DDIs. In the binary classification, positive labels are given to known interactions between the two drugs; the interactions between other pairs of drugs to be detected using the prediction model. In this paper we choose similarity-based DDIs because many times the consequences (side effects) of two drugs add up and lead to a DDI. Sometimes similar drugs work in a similar way leading to a DDI because the body cannot sustain both the drugs at the same time. The Anatomical Therapeutic Chemical classification system (ATC) was used in order to characterize the adverse drug- drug interactions and predict their potential interactions [4]. 978-1-7281-1462-0/19/$31.00 ©2019 IEEE