International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 464
Knowledge Graph and Similarity Based Retrieval Method for Query
Answering System
Aaditi Narkhede
1
, Prof. M.U.Kulkarni
2
, Prof. S.R.Naik
3
1
Student, aaditirajeshn99@gmail.com, Dept. Of Computer Science and Technology, VJTI, Mumbai, India
2
Professor, mukulkarni@ce.vjti.ac.in, Dept. Of Computer Science and Technology, VJTI, Mumbai, India
3
Professor, sraksha@it.vjti.ac.in, Dept. Of Computer Science and Technology, VJTI, Mumbai, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - A semantic network that displays the connections
between entities is known as a knowledge graph. Data can be
visualised to help with information analysis and
comprehension through the use of the knowledge graph.
Knowledge graphs can help professionals with complex
analysis applications and deci- sion support while also
erecting barriers in the market. A good construction system
can help companies construct knowledge graphs efficiently
and quickly. However, Extraction of valuable information from
such huge, complex and unstructured data requires a human
approach to handle user queries related to data, which causes
delays and uncertainty in decision making and strategy
planning. In this paper, we propose an approach for au-
tomatic knowledge graph construction and automated
querying engine to answer user queries using generated
knowledge graph. The proposed system will benefit
Professionals with faster, easierunderstanding and analysis of
complex and huge unstructured data. Experimental results
show that our proposed solution is more effective in
constructing a generic knowledge graph.
Key Words: Knowledge graph, Question Answering
system, spaCy, Natural Language Processing, Named
Entity Recognition.
1. INTRODUCTION
The goal of a knowledge graph is usually to collect,
connect, and show information. It offers a high level of inter-
pretability. Knowledge Graphs help in establishing
purposefulrelationships of organisational knowledge through
classifying various content into different categories [1]. By
grouping different types of content into distinct categories,
knowledge graphs assist in establishing intentional
relationships between organisational knowledge. Large
volumes of unstructured data are produced daily. This
information is presented in reports, research papers, patents,
scholastic articles, book chapters, essays, and speeches,
among other formats. Identifying key patterns in vast
amounts of unstructured data is crucial in to- day’s
environment. It is challenging to understand and evaluatethe
implications provided in an organization’s data since the
essential information is dispersed across the large volumes of
data.
Due to the absence of boundaries between the items that
need to be retrieved, the target entities’ context dependency,
the variability in language patterns, and the limits of
statisticalapproaches, automatic information extraction from
such vast amounts of data is challenging. The fact that this
type of datais frequently available as unstructured texts or in
PDF format presents another challenge when trying to
extract information from it. As a result, either laborious
manual preprocessing is required or sophisticated ETL
(Extract, transform, load)systems are used to automatically
ingest data. To handle this challenge, required data i.e. textual
data will be first extracted from PDFs for our research work
using Fine Tuned detectron2based model and pytesseract ocr
and will be stored in a text filewhich will be used further for
information extraction purposes while building Knowledge
Graph.
The proposed approach aims to resolve issues of ambigu-
ity, abbreviations and semantics of text while constructing a
knowledge graph, achieved with the use of spaCy based NER
for extracting the entity-pairs and relations from the data.
Based on triplets obtained while information extraction, the
Knowledge graph is constructed. In the proposed Question-
Answering system, for Query analysis a similar spaCy based
approach is used for entity-pair and relation extraction from
user query. In the Answer Extraction module, the
combination of approaches is used, such as information
retrieval based on the feature information of relevant entities
in sentences and uses trained feature classifiers to sort the
candidate answers and obtain the solutions; along with
matching query triplets with knowledge graph database
using generic linguistics rules designed to obtain the
solutions.
1.1 Organization Of the Paper
Section 2. of the paper describes the previous and current
study being carried out in the field of Knowledge Graphs
and Question-Answering Systems. It also states the
drawbacksand the problems faced in existing approaches.
Section 3.of the Paper explains the proposed methodology
and the step by step execution of the same with the help of a
few examples. Section 4. of the Paper summarizes and
analyses the Results obtained and gives an insight regarding
how the proposed system can be deployed for Different