Distillation of Knowledge from the Research Literature on
Alzheimer's Dementia
Wutthipong Kongburan
King Mongkut's University of
Technology Thonburi
Thailand
58130800102@st.sit.kmutt.ac.th
Mark Chignell
University of Toronto
Canada
chignell@mie.utoronto.ca
Jonathan Chan
King Mongkut's University of
Technology Thonburi
Thailand
jonathan@sit.kmutt.ac.th
ABSTRACT
Many countries are aging societies. Since abilities generally
deteriorate with age, technologies can assist older adults in their
daily life. Loss of cognitive status is particularly severe in cases
of dementia, with around 70% (according to Alzheimers.net) of
dementia cases involving Alzheimer’s Dementia (AD), a
progressive and currently incurable disease. There is
considerable research on AD with thousands of relevant
publications being added to the PubMed online database every
year. The knowledge incorporated in this large body of work is
spread across hundreds of thousands of pages of text, making it
difficult to distill and mobilize that knowledge in terms of
treatments and guidelines. Text mining technology may assist in
distilling knowledge from the vast corpus of research literature
on Alzheimer’s dementia. In this paper, we apply the Named
Entity Recognition (NER) system, a text mining (TM) method
used to group words into classes, in order to extract useful
information from free texts. We present findings concerning
how well NER can extract information from a corpus of AD
research publications.
CCS CONCEPTS
Applied computing → Life and medical sciences → Health care
information systems
KEYWORDS
Aging society; Alzheimer intervention; Named entity
recognition; PubMed; Quality of life;
1. INTRODUCTION
An estimated 10% of the world population was aged 65 or older
as of this writing, and in many countries in Europe and Japan
that proportion is over 20% and climbing. In one example of this
demographic trend, in 2015 Statistics Canada reported that, for
the first time there were more people aged 65 or over than there
were under 15
1
. Meanwhile, in Japan, the proportion of elderly
(over the age of 65) citizens reached 26% in 2015
2
. An
increasing number of older adults is associated with an increased
burden of health problems, because many physical and cognitive
functions decline even with healthy aging, and declines are
typically more pronounced in the case of disease. Alzheimer
disease (AD) is one of the most prevalent chronic medical
1
http://www.statcan.gc.ca/daily-quotidien/150929/dq150929b-eng.pdf
2
http://www8.cao.go.jp/kourei/english/annualreport/2014/pdf/c1-1.pdf
conditions affecting older people and is a major cause of severe
decline in cognition and loss of the ability to live independently.
As of this writing there are close to 50 million cases of AD or
related dementias worldwide
3
. As many as 50 to 70 percent of
all dementia cases are AD, according to Alzheimers.net. In
addition, 1-in-9 Americans over 65 has AD
4
. Behavioral
symptoms associated with dementia include repetitive speech,
wandering, and sleep disturbances, along with loss of memory
and an increase in risk of conditions such as depression and
delirium. As of this writing there are no effective treatments for
AD and the clinical focus has been on managing the symptoms
of dementia. Since many types of treatment have been proposed,
information about what works when dealing with behavioral
problems associated with people at different stages of AD can
enhance quality of life not only for those with AD but also for
their caregivers.
The main aim of the research reported in this paper is to
demonstrate how Text Mining (TM) can extract useful
information about AD treatments from the scientific literature on
AD. First we describe the construction of a training dataset
(corpus) from the abstracts of scientific papers with a focus on
AD. We then used Named Entity Recognition (NER), trained
using the training data set, to label entities of interest within a
sample set of real-world test cases. The results demonstrate that
NER can be used to classify relevant entities within the AD
literature.
2. BACKGROUND
NER is a key approach to TM that identifies keywords in text
streams and classifies them into predefined relevant categories
such as gene, or protein. Various techniques have been proposed
to develop NER systems. They can be categorized as rule-based,
dictionary-based and Machine Learning (ML)-based (see more
information in [2]). As can be seen in [2, 4, 5], when the
appropriate resources are available, the ML-based solutions
present several advantages, and perform better than dictionary-
based and rule-based approaches. In this paper, we use ML-
based TM to deal with the problem of NER.
We used the NER classifier developed at Stanford University.
Stanford NER is a Java implementation of NER labelled
sequences of words in a text which include names of people,
locations, and company names. NER used the Conditional
Random Fields (CRFs) technique [8] to train the classifier based
on a training set of labeled entities within a corpus of
documents. Other projects that have used CRFs in NER include
Gimli [1] and BANNER [9]. These two open source tools
automatically tagging genes, proteins and other entity names in
3
https://www.alz.org/documents_custom/2016-facts-and-figures.pdf
4
http://www.alzheimers.net/resources/alzheimers-statistics/
© 2017 International World Wide Web Conference Committee (IW3C2),
published under Creative Commons CC BY 4.0 License.
WWW 2017 Companion, April 3-7, 2017, Perth, Australia.
ACM 978-1-4503-4914-7/17/04.
DOI: http://dx.doi.org/10.1145/3041021.3054934
1137