Tumor Morphology Mentions Identification Using Deep Learning and Conditional Random Fields Utpal Kumar Sikdar a , Björn Gambäck b and M Krishna Kumar c a IBS Software Pvt. Ltd., Trivandrum, Techopark main gate, India-695581 b Department of Computer Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway c IBS Software Pvt. Ltd., Trivandrum, Techopark main gate, India-695581 Abstract The paper reports the application of several machine learning methods to the task of automatically fnd- ing tumor morphology mentions in Spanish clinical texts. Three setups based on Conditional Random Fields (CRF) techniques with diferent feature combinations were tested as well as a deep learning model (Bi-directional-LSTM-CNN). The best performance was achieved by combining two of the CRF-based learners and the neural network using a majority voting ensemble. Keywords named entity recognition, CRF, Bi-LSTM, CNN, GloVe 1. Introduction To understand diseases, we need to extract certain key entities such as symptoms, duration, patient age and weight, etc. from unstructured textual medical data. This task, clinical text mining, is important to enable better clinical decision-making. It is, for example, very helpful if we can extract key entities from a pandemic situation (such as COVID-19, SARS, and locations) and take appropriate actions based on the disease symptoms and their attributes. Natural Language Processing flls an important role in extracting such key entities from diferent types of textual sources in various languages. A myriad of medical texts are generated each day in various languages. Only in Spanish, almost a thousand electronic patient records are generated every minute. Hence automatically processing clinical texts in Spanish is a challenging task, but with a large potential for the medical user community as well as for the pharmaceutical industry and the patients. Similar to Named Entity Recognition, tumor mention identifcation is a sequence labelling task. Following results published by several researchers in 2016 [1, 2, 3], state-of-the-art work on such sequence labelling tasks has focused on deep learning setups using a neural network structure, in particular Long Short-Term Memory Recurrent Neural Networks [LSTM; 4], followed by Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) email: utpal.sikdar@gmail.com (U.K. Sikdar); gamback@ntnu.no (B. Gambäck); krishna.kumar@ibsplc.com (M.K. Kumar) url: https://www.linkedin.com/in/dr-utpal-kumar-sikdar-31a1779b/ (U.K. Sikdar); https://www.ntnu.edu/employees/gamback (B. Gambäck); https://www.linkedin.com/in/m-krishna-kumar-56383220/ (M.K. Kumar) orcid: 0000-0002-5252-707X (B. Gambäck) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)