Fully Contextualized Biomedical NER Ashim Gupta 1 , Pawan Goyal 1 , Sudeshna Sarkar 1 , and Nandeshwar Gattu 2 1 Indian Institute of Technology Kharagpur, India ashimgupta95@gmail.com, {pawang,sudeshna}@cse.iitkgp.ac.in 2 Excelra Knowledge Solutions, Hyderabad, India nandu.gattu@excelra.com Abstract. Recently, neural network architectures have outperformed traditional methods in biomedical named entity recognition. Borrowed from innovations in general text NER, these models fail to address two important problems of polysemy and usage of acronyms across biomed- ical text. We hypothesize that using a fully-contextualized model that uses contextualized representations along with context dependent tran- sition scores in CRF can alleviate this issue and help further boost the tagger’s performance. Our experiments with this architecture have shown to improve state-of-the-art F1 score on 3 widely used biomedical corpora for NER. We also perform analysis to understand the speciﬁc cases where our contextualized model is superior to a strong baseline. 1 Introduction Biomedical Named Entity Recognition (NER) is a fundamental step in several downstream biomedical text mining and information extraction tasks like rela- tion classiﬁcation, co-reference resolution etc. Traditional Biomedical NER sys- tems [7, 8] have often relied on task speciﬁc hand crafted features. Recent neural network based architectures in biomedical domain [15] have shown that compa- rable results can be achieved without making use of these hand engineered fea- tures although the performance is still dependent on the quality of learned word representations [16]. Character embeddings and pre-trained distributed embed- dings have been used to model complex syntactic and semantic characteristics of words. But these complementary embedding models fail to capture diﬀerent word uses across diﬀerent linguistic contexts (i.e, polysemy ). This problem is compounded in biomedical text due to ambiguous usage of words from general text [14] (ex: column in general English means an upright pillar while in medical context can be taken to mean the spine ). Word representations obtained from training on biomedical corpora do not solve this problem because both forms, general English and biomedical, are generally present in the training text. Another issue speciﬁc to biomedical domain is the generous usage of abbre- viations (ex: gene/protein names like ALA, MEN 1 ) without explicit mention of their full forms. Neither character embeddings nor distributed word embed- dings are eﬀective in solving this issue. Character embeddings do not help as these abbreviations are mostly acronyms, where all characters are capitalized irrespective of the entity type. Word embeddings generally fail as most of these acronyms fall outside their vocabulary.