BioNex: A System For Biomedical News Event Exploration Patrick Ernst Max Planck Institut f¨ ur Informatik Germany pernst@mpi-inf.mpg.de Arunav Mishra Max Planck Institut f¨ ur Informatik Germany amishra@mpi-inf.mpg.de Avishek Anand L3S Research Center Germany anand@L3S.de Vinay Sey Aalborg University Denmark vinay@cs.aau.dk ABSTRACT We demonstrate BioNex, a system to mine, rank and visualize biomedical news events. BioNex takes biomedical queries such as “Ebola virus disease” and retrieves the k most relevant news events for them. To achieve this we first mine the generic news events by clustering them on a daily basis using general named entities and textual features. ese clusters are also tagged with disambiguated biomedical entities which aid in biomedical news event exploration. ese clusters are then used to compute the importance scores for the event clusters based on a combination of textual, semantic, popularity and historical importance features. BioNex also visualizes the retrieved event clusters to highlight the top news events and corresponding news articles for the given query. e visualization also provides the context for news events using (1) a chain of historically relevant news event clusters, and (2) other non-biomedical events from the same day. KEYWORDS Biological Event Exploration; Event Clustering; Biomedical Entities 1 INTRODUCTION Infectious diseases and medical epidemics are still major causes of death and health concerns in underdeveloped countries. Due to increased movements of people in a connected and interdependent world there is an increased risk of spreading epidemic diseases such as Ebola virus disease, Zika fever and Influenzas (like swine flu, bird flu, etc.) at a global scale. is was evident from the recent Ebola outbreaks in West Africa which was eventually spread to Spain 1 and United States 2 via health workers who traveled to West Africa. Health organizations such as the WHO and the CDC spend tremendous resources to investigate the reasons and context of such epidemics to be prepared for future disease outbreaks. 1 hp://www.bbc.com/news/world-europe-29514920 2 hps://goo.gl/wsa4wE Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. SIGIR '17, August 07-11, 2017, Shinjuku, Tokyo, Japan © 2017 ACM. 978-1-4503-5022-8/17/08. . . $15.00 DOI: 10.1145/3077136.3084150 During epidemic outbreaks, online news media plays an import role in providing regular warnings and timely updates. For example, the first news article mentioning Ebola from the affected countries was published on March 20th, 2014 (two days prior to the official announcement of the outbreak) [14]. Online news media has also been growing rapidly with various media outlets and independent providers producing massive amounts of news articles. is on the one hand made various analytics tasks possible, on the other hand, has made it difficult to retrospect on past epidemic outbreaks. For example, the Gdelt project 3 collects and analyzes hundreds of thousands of news articles each day by crawling more than 6000 online news sources from over 127 countries worldwide. ese news collections are rich sources of information about past disease outbreaks, and other co-occurring news events. For understanding the context which contributed to the epidemics, analyzing these news collections becomes essential. Since these collections extensively cover daily news events, the news events about the epidemics can be studied in the context of other news events that were also popular at the same time. For exploratory analysis of biomedical news events, searching on individual news articles is not sufficient. Instead we need clus- ters of news articles discussing the same event, which can be used to derive popularity features, historical importance and linking them to similar events in the past. It is thus required to design an efficient and scalable system that facilitates exploratory search over automatically mined biomedical news events represented as precomputed clusters of multiple news articles from a stream of daily news articles from numerous media outlets. To aid exploration over large and complex news article clusters, the system should exhibit a query interface that accepts and suggest tentative queries as starting points. In addition, effective cluster visualization and browsing tools are essential to facilitate to perform deeper analyt- ics on the news article clusters describing an event. Appropriate temporal visualization tools can aid effective and efficient analytics. While exploring the biomedical news events considering the biomedical entities is essential. However, since we are dealing with news articles from diverse geographical regions, the biomedical enti- ties may be expressed in different surface forms. For example, Swine flu is also reported as swine influenza, H1N1 virus pandemic, pig influenza, hog flu, and pig flu. It thus is crucial to disambiguate the biomedical entities in the text to identify news articles reporting a disease in different surface forms. Even though existing techniques 3 hp://gdeltproject.org Demonstration Paper SIGIR’17, August 7-11, 2017, Shinjuku, Tokyo, Japan 1277