DOI: 10.4018/IJKM.2020010104
International Journal of Knowledge Management
Volume 16 • Issue 1 • January-March 2020
Copyright © 2020, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
83
A Hybrid Approach to Retrieve
Knowledge from a Document
Deepak Sahoo, IIIT-Bhubaneswar, Bhubaneswar, India
Rakesh Chandra Balabantaray, IIIT Bhubaneswar, Bhubaneswar, India
ABSTRACT
The task of retrieving the theme of a document and presenting a shorter form compared to the original
text to the user is a challenging assignment. In this article, a hybrid approach to extract knowledge from
a text document is presented, in which three key sentence level relationships in association with the
Markov clustering algorithm is used to cluster sentences in the document. After clustering, sentences
are ranked in each cluster and the highest ranked sentences in each cluster are merged. In the end, to
get the final theme of the document, the Gradient boosting technique XGboost is used to compress
the newly generated sentence. The DUC-2002 data set is used to evaluate the proposed system and it
has been observed that the performance of the proposed system is better than other existing systems.
KeywoRDS
Knowledge Retrieval, Rouge Score, Sentence Clustering, Sentence Compression, Sentence Merging, XGBoost
INTRoDUCTIoN
Knowledge management (KM) is a method originated in the business world for unifying the huge
amounts of documents generated from meetings, proposals, presentations, analytic papers, training
materials (Bordoni et al., 2002). The documents created in an organization represent its potential
knowledge. “Potential” because only parts of this data and information will be found helpful to be
used by them to create organizational knowledge. In this view, one major challenge is the selection of
relevant information from vast amounts of documents, and the ability of making it available for use
and re-use by organization members. The objective of the “mainstream” of knowledge management
is to ensure that the right information is delivered to the right person at the right time, in order to
take the most appropriate decision. In this sense, KM is not aimed at managing knowledge per se,
but to relate knowledge and its usage. Along with this line, we focus on the extraction of relevant
information to be delivered to a decision maker.
The knowledge pyramid has been used for several years to illustrate the hierarchical relationships
between data, information, knowledge, and wisdom. The revised knowledge pyramid model proposed
by (Jennex, 2013, 2017) includes knowledge management as extraction of reality with a focus on
organizational learning.
To this end, a range of Text Mining (TM) and Natural Language Processing (NLP) techniques can
be used as an effective Knowledge Management System (KMS) supporting the extraction of relevant
information from large amounts of unstructured textual data and, thus, the creation of knowledge
(Bordoni et al., 2002).
There has been an explosion in the amount of text data from a variety of sources. This volume of
text is an invaluable source of information and knowledge which needs to be effectively summarized to