BioNex: A System For Biomedical News Event Exploration
Patrick Ernst
Max Planck Institut f¨ ur Informatik
Germany
pernst@mpi-inf.mpg.de
Arunav Mishra
Max Planck Institut f¨ ur Informatik
Germany
amishra@mpi-inf.mpg.de
Avishek Anand
L3S Research Center
Germany
anand@L3S.de
Vinay Sey
Aalborg University
Denmark
vinay@cs.aau.dk
ABSTRACT
We demonstrate BioNex, a system to mine, rank and visualize
biomedical news events. BioNex takes biomedical queries such
as “Ebola virus disease” and retrieves the k most relevant news
events for them. To achieve this we first mine the generic news
events by clustering them on a daily basis using general named
entities and textual features. ese clusters are also tagged with
disambiguated biomedical entities which aid in biomedical news
event exploration. ese clusters are then used to compute the
importance scores for the event clusters based on a combination
of textual, semantic, popularity and historical importance features.
BioNex also visualizes the retrieved event clusters to highlight
the top news events and corresponding news articles for the given
query. e visualization also provides the context for news events
using (1) a chain of historically relevant news event clusters, and
(2) other non-biomedical events from the same day.
KEYWORDS
Biological Event Exploration; Event Clustering; Biomedical Entities
1 INTRODUCTION
Infectious diseases and medical epidemics are still major causes of
death and health concerns in underdeveloped countries. Due to
increased movements of people in a connected and interdependent
world there is an increased risk of spreading epidemic diseases such
as Ebola virus disease, Zika fever and Influenzas (like swine flu,
bird flu, etc.) at a global scale. is was evident from the recent
Ebola outbreaks in West Africa which was eventually spread to
Spain
1
and United States
2
via health workers who traveled to West
Africa. Health organizations such as the WHO and the CDC spend
tremendous resources to investigate the reasons and context of
such epidemics to be prepared for future disease outbreaks.
1
hp://www.bbc.com/news/world-europe-29514920
2
hps://goo.gl/wsa4wE
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permied. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from permissions@acm.org.
SIGIR '17, August 07-11, 2017, Shinjuku, Tokyo, Japan
© 2017 ACM. 978-1-4503-5022-8/17/08. . . $15.00
DOI: 10.1145/3077136.3084150
During epidemic outbreaks, online news media plays an import
role in providing regular warnings and timely updates. For example,
the first news article mentioning Ebola from the affected countries
was published on March 20th, 2014 (two days prior to the official
announcement of the outbreak) [14]. Online news media has also
been growing rapidly with various media outlets and independent
providers producing massive amounts of news articles. is on
the one hand made various analytics tasks possible, on the other
hand, has made it difficult to retrospect on past epidemic outbreaks.
For example, the Gdelt project
3
collects and analyzes hundreds of
thousands of news articles each day by crawling more than 6000
online news sources from over 127 countries worldwide.
ese news collections are rich sources of information about
past disease outbreaks, and other co-occurring news events. For
understanding the context which contributed to the epidemics,
analyzing these news collections becomes essential. Since these
collections extensively cover daily news events, the news events
about the epidemics can be studied in the context of other news
events that were also popular at the same time.
For exploratory analysis of biomedical news events, searching
on individual news articles is not sufficient. Instead we need clus-
ters of news articles discussing the same event, which can be used
to derive popularity features, historical importance and linking
them to similar events in the past. It is thus required to design
an efficient and scalable system that facilitates exploratory search
over automatically mined biomedical news events represented as
precomputed clusters of multiple news articles from a stream of
daily news articles from numerous media outlets. To aid exploration
over large and complex news article clusters, the system should
exhibit a query interface that accepts and suggest tentative queries
as starting points. In addition, effective cluster visualization and
browsing tools are essential to facilitate to perform deeper analyt-
ics on the news article clusters describing an event. Appropriate
temporal visualization tools can aid effective and efficient analytics.
While exploring the biomedical news events considering the
biomedical entities is essential. However, since we are dealing with
news articles from diverse geographical regions, the biomedical enti-
ties may be expressed in different surface forms. For example, Swine
flu is also reported as swine influenza, H1N1 virus pandemic, pig
influenza, hog flu, and pig flu. It thus is crucial to disambiguate the
biomedical entities in the text to identify news articles reporting a
disease in different surface forms. Even though existing techniques
3
hp://gdeltproject.org
Demonstration Paper SIGIR’17, August 7-11, 2017, Shinjuku, Tokyo, Japan
1277