190 Int. J. Business Intelligence and Data Mining, Vol. 11, No. 2, 2016
Copyright © 2016 Inderscience Enterprises Ltd.
BNEMiner: mining biomedical literature for extraction
of biological target, disease and chemical entities
Sindhuja Gopalan and Sobha Lalitha Devi*
AU-KBC Research Centre,
MIT Campus,
Anna University,
Chennai, India
Email: sindhujagopalan@au-kbc.org
Email: sobha@au-kbc.org
*Corresponding author
Abstract: The paper presents a novel application to extract biomedical entities
automatically using machine learning techniques from large volumes of
biomedical text. The data in large quantities are accumulating day by day and
requires automatic extraction of information. Data mining is the science of
extracting information from large data. Biomedical Named entity recognition
(BioNER) is the task of data mining that extracts named entities from
biological texts. In this paper, we focus on developing a BioNER system for
extraction of biological target, disease and chemical entities from biomedical
texts. We developed the system using graphical based machine learning
technique the CRFs. We have applied a set of diverse features containing
standard lexical, syntactic and orthographic features combined with novel and
biologically inspired features, action terms and process verbs. The system was
evaluated with three widely recognised datasets. The results demonstrated the
portability and the potency of the system.
Keywords: data mining; biomedical entities; graph-based model; biologically
motivated features; portability.
Reference to this paper should be made as follows: Gopalan, S. and Devi, S.L.
(2016) ‘BNEMiner: mining biomedical literature for extraction of biological
target, disease and chemical entities’, Int. J. Business Intelligence and Data
Mining, Vol. 11, No. 2, pp.190–204.
Biographical notes: Sindhuja Gopalan is a Research Engineer working with
the Computational Linguistic Research Group of AU-KBC Research Centre,
Anna University, Chennai, India. Her research interests include semantic text
processing (data mining and text mining) and discourse analysis. She holds a
Masters in Bioinformatics. Currently, she is pursuing her PhD in BioNLP with
Sobha Lalitha Devi. She has participated in international tasks like BioCreative
international event and CoNLL Shared Task. She was a Visiting Research
Scholar at Universidad Politècnica de València (UPV).
Sobha Lalitha Devi is a Scientist in the Information Sciences Division of
AU-KBC Research Centre, Anna University, Chennai, India. Her research
interests are in the field of discourse analysis, text mining, information
extraction and retrieval. She specialises in the area of anaphora resolution. She
works in various genre such as new wires, biomedical texts and also in various
families of language.