International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 464 Knowledge Graph and Similarity Based Retrieval Method for Query Answering System Aaditi Narkhede 1 , Prof. M.U.Kulkarni 2 , Prof. S.R.Naik 3 1 Student, aaditirajeshn99@gmail.com, Dept. Of Computer Science and Technology, VJTI, Mumbai, India 2 Professor, mukulkarni@ce.vjti.ac.in, Dept. Of Computer Science and Technology, VJTI, Mumbai, India 3 Professor, sraksha@it.vjti.ac.in, Dept. Of Computer Science and Technology, VJTI, Mumbai, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - A semantic network that displays the connections between entities is known as a knowledge graph. Data can be visualised to help with information analysis and comprehension through the use of the knowledge graph. Knowledge graphs can help professionals with complex analysis applications and deci- sion support while also erecting barriers in the market. A good construction system can help companies construct knowledge graphs efficiently and quickly. However, Extraction of valuable information from such huge, complex and unstructured data requires a human approach to handle user queries related to data, which causes delays and uncertainty in decision making and strategy planning. In this paper, we propose an approach for au- tomatic knowledge graph construction and automated querying engine to answer user queries using generated knowledge graph. The proposed system will benefit Professionals with faster, easierunderstanding and analysis of complex and huge unstructured data. Experimental results show that our proposed solution is more effective in constructing a generic knowledge graph. Key Words: Knowledge graph, Question Answering system, spaCy, Natural Language Processing, Named Entity Recognition. 1. INTRODUCTION The goal of a knowledge graph is usually to collect, connect, and show information. It offers a high level of inter- pretability. Knowledge Graphs help in establishing purposefulrelationships of organisational knowledge through classifying various content into different categories [1]. By grouping different types of content into distinct categories, knowledge graphs assist in establishing intentional relationships between organisational knowledge. Large volumes of unstructured data are produced daily. This information is presented in reports, research papers, patents, scholastic articles, book chapters, essays, and speeches, among other formats. Identifying key patterns in vast amounts of unstructured data is crucial in to- day’s environment. It is challenging to understand and evaluatethe implications provided in an organization’s data since the essential information is dispersed across the large volumes of data. Due to the absence of boundaries between the items that need to be retrieved, the target entities’ context dependency, the variability in language patterns, and the limits of statisticalapproaches, automatic information extraction from such vast amounts of data is challenging. The fact that this type of datais frequently available as unstructured texts or in PDF format presents another challenge when trying to extract information from it. As a result, either laborious manual preprocessing is required or sophisticated ETL (Extract, transform, load)systems are used to automatically ingest data. To handle this challenge, required data i.e. textual data will be first extracted from PDFs for our research work using Fine Tuned detectron2based model and pytesseract ocr and will be stored in a text filewhich will be used further for information extraction purposes while building Knowledge Graph. The proposed approach aims to resolve issues of ambigu- ity, abbreviations and semantics of text while constructing a knowledge graph, achieved with the use of spaCy based NER for extracting the entity-pairs and relations from the data. Based on triplets obtained while information extraction, the Knowledge graph is constructed. In the proposed Question- Answering system, for Query analysis a similar spaCy based approach is used for entity-pair and relation extraction from user query. In the Answer Extraction module, the combination of approaches is used, such as information retrieval based on the feature information of relevant entities in sentences and uses trained feature classifiers to sort the candidate answers and obtain the solutions; along with matching query triplets with knowledge graph database using generic linguistics rules designed to obtain the solutions. 1.1 Organization Of the Paper Section 2. of the paper describes the previous and current study being carried out in the field of Knowledge Graphs and Question-Answering Systems. It also states the drawbacksand the problems faced in existing approaches. Section 3.of the Paper explains the proposed methodology and the step by step execution of the same with the help of a few examples. Section 4. of the Paper summarizes and analyses the Results obtained and gives an insight regarding how the proposed system can be deployed for Different