Proceedings of the fifth BioCreative challenge evaluation workshop. Recognition and normalization of disease mentions in PubMed abstracts Jitendra Jonnagaddala 1, 2 , Nai-Wen Chang 3, 4 , Toni Rose Jue 2 , Hong-Jie Dai* 51 1 School of Public Health and Community Medicine, UNSW Australia 2 Prince of Wales Clinical School, UNSW Australia 3 Institution of Information Science, Academia Sinica, Taiwan 4 Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taiwan 5 Department of Computer Science and Information Engineering, National Taitung University, Taiwan {z3339253, t.jue}@unsw.edu.au d00945020@ntu.edu.tw hjdai@nttu.edu.tw Abstract. The rapidly increasing number of available PubMed documents calls the need for an automatic approach in the identification and normalization of disease mentions in order to increase the precision and effectivity of information retrieval. We herein describe our team’s participation for the Disease Named Entity Recognition and Normalization subtask under the chemical-disease relations track of the BioCreative V shared task. We developed a CRF-based model using BIESO tagging format to allow automated recognition of disease entities in PubMed abstracts. Recognized disease entities were normalized to MeSH concepts using a dictionary look-up method based on Lucene. Performance is reported using precision, recall and F-measure on three separate runs. Our best run achieved F-measure of 80.74% on disease mention recognition and 67.85 % on disease normalization. Keywords: Disease normalization; Disease recognition; Dictionary lookup; CRF; Disorder identification; Information extraction 1 Introduction The importance of recognizing disease mentions and normalizing these to a standardized vocabulary is increasing with the yearly increase of published biomedical literature [1]. Keywords relating to diseases are the second most common user search query in PubMed, one of the most popularly used biomedical literature database [2]. Due to the doubling increase of biomedical literature available, researchers are now 1 * Corresponding Author 234