2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) © IEEE 2021. This article is free to access and download, along with rights for full text and data mining, re-use and analysis. 455 Predicting Drugs for COVID-19/SARS-CoV-2 via Heterogeneous Graph Attention Networks Yahui Long 12 , Yu Zhang 2 , Min Wu 3 , Shaoliang Peng 1 , Chee Keong Kwoh 2 , Jiawei Luo 1 , and Xiaoli Li 3 1 College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China 2 School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore 3 Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 138632, Singapore Corresponding Authors: luojiawei@hnu.edu.cn and xlli@i2r.a-star.edu.sg Abstract—Coronavirus Disease-19 (COVID-19) has led to global epidemics with high morbidity and mortality. However, there are currently no proven effective drugs targeting COVID- 19. Identifying drug-virus associations can not only provide insights into the understanding of drug-virus interaction mech- anism, but also guide and facilitate the screening of compound candidates for antiviral drug discovery. In this work, we propose a novel framework of Heterogeneous Graph Attention Networks for Drug-Virus Association predictions, named HGATDVA. First, we fully incorporate multiple sources of biomedical data to construct abundant features for drugs and viruses. Second, we construct two drug-virus heterogeneous graphs. For each graph, we design a self-enhanced graph attention network (SGAT) to explicitly model the dependency between a node and its local neighbors and derive the graph-specific representations for nodes. Third, we further develop a neural network architecture with tri-aggregator to aggregate the graph-specific representations to generate the final node representations. Experiments on two datasets were conducted to demonstrate the effectiveness of our proposed method in identifying candidate drugs for viruses. Index Terms—COVID-19, Drug, Heterogeneous graph atten- tion networks, Association prediction. I. I NTRODUCTION C ORONAVIRUS Disease-19 (COVID-19) is an infectious disease caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) [1]. SARS-CoV-2 is an enveloped, positive-sense, single-stranded RNA betacoronavirus of the family Coronaviridae [2] [3]. Coronaviruses (CoVs) typically affect the respiratory tract of mammals, including humans, and lead to mild to severe respiratory tract infections [4]. COVID-19 has led to global epidemics with high morbidity and mortality. However, there are currently no antiviral drugs with proven clinical efficacy for the treatment of COVID- 19. Identifying drug-virus associations is very useful for drug development, as well as disease prevention and treatment. Considering that conventional experiment methods are time- consuming, laborious and expensive, computational methods provide a low cost complementary and can guide the screening of candidate compounds for drug discovery. More recently, several computational methods have been proposed for drug-microbe association prediction. For exam- ple, Zhu et al. [5] presented a KATZ-based method named HMDAKATZ for drug-microbe predictions using drug chem- ical similarity and Gaussian kernel similarity. Long et al. [6] proposed a novel computational model named HNERMDA to predict drug-microbe associations based on a heteroge- neous network. Following that, Long et al. [7] developed another prediction model called GCNMDA to infer latent drug-microbe associations combining with microbial protein interactions and drug chemical information. However, all the above existing methods do not fully consider the biological knowledge associated with viruses. Very recently, Andersen et al. [8] released a comprehensive database called DrugVirus that records experimentally and clinically validated drug- virus associations. In addition, there are many other available knowledge databases for drugs and viruses, such as Drugbank [9], Uniprot [10] and Virhostnet [11]. The availability of these rich data provides a golden opportunity for us to develop deep learning methods for drug-virus association predictions. In particular, graph neural network, e.g., graph attention network (GAT) proposed by Veliˇ ckovi´ c et al. [12], is a promising deep learning technique due to its powerful capability for graph- structured data. However, there exist several challenges when using GAT for drug-virus predictions. First, biological data related to drugs and viruses are often heterogeneous and different source data represent distinct biological meanings. Thus it is a challenge to integrate them as effective input features in a GAT framework for drug-virus predictions. Second, the observed/known drug- virus associations are limited and sparse so that it brings great challenges for using GAT to model drug-virus associations. To deal with above issues, we developed a novel Heterogeneous Graph Attention Network (HGAT) based framework for Drug- Virus Association prediction (HGATDVA). In particular, we first built two networks/graphs for drug-virus prediction, i.e., a drug-virus heterogeneous network with known drug-virus associations and a drug-host-virus heterogeneous network by integrating drug-target interactions with virus-host (human) protein interactions. Then we exploited multiple biomedical data, e.g., virus genome sequences, drug chemical structure information, viral protein sequences, drug-drug interactions, etc., to derive input features for drugs, viruses and proteins. For each graph, we designed a self-enhanced attention mecha- nism to learn graph-specific representation for each node. We further developed a Multilayer perceptron (MLP) based tri- aggregator to combine graph-specific representations and thus 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 978-1-7281-6215-7/20/$31.00 ©2020 IEEE DOI: 10.1109/BIBM49941.2020.9313472