Neural Multi-Task Learning for Adverse Drug Reaction Extraction Feifan Liu, PhD 1 , Xiaoyu Zheng, MS 1 , Hong Yu, PhD 2 , Jennifer Tjia, MD 1 1 University of Massachusetts Medical School, Worcester, MA, USA; 2 University of Massachusetts Lowell, Lowell, MA, USA Abstract A reliable and searchable knowledge database of adverse drug reactions (ADRs) is highly important and valuable for improving patient safety at the point of care. In this paper, we proposed a neural multi-task learning system, NeuroADR, to extract ADRs as well as relevant modifiers from free-text drug labels. Specifically, the NeuroADR system exploited a hierarchical multi-task learning (HMTL) framework to perform named entity recognition (NER) and relation extraction (RE) jointly, where interactions among the learned deep encoder representations from different subtasks are explored. Different from the conventional HMTL approach, NeuroADR adopted a novel task decomposition strategy to generate auxiliary subtasks for more inter-task interactions and integrated a new label encoding schema for better handling discontinuous entities. Experimental results demonstrate the effectiveness of the proposed system. Introduction Drug labels are intended to provide health care professionals with clear and concise prescribing information that will enhance patient safety at the point of care 1 . That important information is in an unstructured format that greatly limits its potential for real-life clinical practices, thus automatic extraction of ADRs associated with relevant properties from narrative drug labels has drawn increasing attention in pharmacovigilance community 2 , natural language processing community 3 and the government 4 . In 2017, the U.S. Food and Drug Administration (FDA) and the U.S. National Library of Medicine (NLM) jointly organized a shared task entitled “Adverse Drug Reaction Extraction from Drug Labels” at the Text Analysis Conference (TAC-ADR), which further advanced text mining techniques for ADRs extraction from drug labels 5 . Our study focuses on the named entity recognition (NER) task, i.e. extracting ADRs and related concept modifiers (Severity, Factors, DrugClass, Negation, Animal), and relation extraction (RE) task, i.e. identifying the relations (Negated, Hypothetical, Effect) between ADRs and related concept modifiers. For example, in the text “Grade 3 cutaneous reactions”, “Grade 3” is a Severity, “cutaneous reactions” is an ADR, and there is an “Effect” relation between them. Most of the existing systems for ADRs extraction exploited deep learning approaches which have shown promising results in many natural language processing (NLP) tasks 6 . For instance, Saldana explored convolutional neural networks (CNN) for detecting ADR relevant sentences 7 and Alimova et al. utilized interactive attention neural network (IAN) to detect ADRs from biomedical texts 8 . To effectively train deep neural networks, however, it usually needs millions of labeled samples which are often prohibitively expensive to get in many real-life applications 9 . To address this challenge, semi-supervised methods based on co-training 10 and neural network pre-training 11 have been proposed respectively for extracting adverse drug reaction mentions from tweets. A popular alternative solution is Multi-Task Learning (MTL) 12 , which has been widely applied and led to successes across all applications of machine learning, including speech recognition 13 , NLP 14 , computer vision 15 . MTL has also been applied to ADR extraction from social media texts 16,17 . Several prior works have developed hierarchical MTL 18–20 , which integrates supervised feedback from each task at different levels of a specific hierarchy, achieving better system performance than traditional MTL approaches. Hierarchical MTL can be seen as a seamless way to combine multi-task and cascaded learning which is especially helpful for NLP tasks with low-level tasks feeding into high-level ones 18 . More recently, Sanh et al. further demonstrated the effectiveness of a hierarchically supervised multi-task learning on four related semantic tasks 21 without complex regularization schemes. Although conventional MTL approaches through shared components have been successfully applied in biomedical domain 22,23 , exploring the emerging hierarchical MTL in biomedical applications is still an untapped but promising area. There lacks an understanding of how effective the hierarchical MTL works and what adaptations are needed to increase its potential without compromising the system generalizability in the biomedical domain. In this work, we proposed a new hierarchical MTL system, NeuroADR, for efficient ADR extraction from narrative drug labels. Unlike the top performed system at TAC-ADR 2017, NeuroADR is end-to-end trainable and was able to achieve comparable performance without relying on any handcrafted heuristic rules. 756