Systems biology Drug-induced adverse events prediction with the LINCS L1000 data Zichen Wang, Neil R. Clark and Avi Ma’ayan* Department of Pharmacology and Systems Therapeutics, One Gustave L. Levy Place Box 1215, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA *To whom correspondence should be addressed. Associate Editor: Alfonso Valencia Received on October 20, 2015; revised on March 5, 2016; accepted on March 23, 2016 Abstract Motivation: Adverse drug reactions (ADRs) are a central consideration during drug development. Here we present a machine learning classifier to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) fea- tures. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20 000 small-molecule compounds including most of the FDA-approved drugs. Using various benchmarking methods, we show that the integration of GE data with the CS of the drugs can sig- nificantly improve the predictability of ADRs. Moreover, transforming GE features to enrichment vectors of biological terms further improves the predictive capability of the classifiers. The most predictive biological-term features can assist in understanding the drug mechanisms of action. Finally, we applied the classifier to all >20 000 small-molecules profiled, and developed a web por- tal for browsing and searching predictive small-molecule/ADR connections. Availability and Implementation: The interface for the adverse event predictions for the >20 000 LINCS compounds is available at http://maayanlab.net/SEP-L1000/. Contact: avi.maayan@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction Adverse Drug Reactions (ADRs) are harmful or unpleasant unin- tended side effects resulting from drug intervention (Edwards and Aronson, 2000). ADRs are a major concern for both public health and the drug development process. Failure to identify severe ADRs in clinical trials can lead to significant morbidity, and drugs with- drawn from the market can carry a substantial negative economic impact (Giacomini et al., 2007). Despite such harmful consequences, ADRs can provide useful information about drug/human–phenotype relationships (Kuhn et al., 2010). ADRs and human diseases can often overlap. For example, common ADR phenotypes, such as neuropathy or long-QT syndrome, can manifest as genetically inher- itable diseases (Kuhn et al., 2010). Therefore, better understanding drug/human–phenotype connections from a network perspective can lead to improved understanding of human diseases. Various efforts have been made to predict ADRs using the properties/ attributes of drugs. Generally, the attributes of drugs used so far to predict ADRs can be categorized into two types: (i) those that mostly consider the chemical aspect of the drug, and (ii) those that consider the biological aspects of the drug. Once such attributes are organized into attribute tables, a binary classification problem can be established for each ADR. Initially, chemical features of drugs alone were used to predict the association between drugs and their known ADRs under the assumption that ADRs may correlate with the chemical fragments of the drugs that induce them (Pauwels et al., 2011; Scheiber et al., 2009). For example, (Scheiber et al., 2009) were able to map the chemical substructure of drugs, described by the extended connectivity fingerprints (ECFPs), to dif- ferent system organ classes ADRs defined by the Medical Dictionary for Regulatory Activities (MedDRA) (Brown et al., 1999). V C The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 1 Bioinformatics, 2016, 1–8 doi: 10.1093/bioinformatics/btw168 Advance Access Publication Date: 1 April 2016 Original Paper Bioinformatics Advance Access published April 20, 2016 at New York University on April 26, 2016 http://bioinformatics.oxfordjournals.org/ Downloaded from