DOI: 10.4018/IJeC.2020070105 International Journal of e-Collaboration Volume 16 • Issue 3 • July-September 2020 Copyright © 2020, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 73 A New Hybrid Document Clustering for PRF-Based Automatic Query Expansion Approach for Efective IR Yogesh Gupta, BML Munjal University, Haryana, India Ashish Saini, Dayalbagh Educational Institute, India ABSTRACT Automatic query expansion (AQE) is an effective measure to improve information retrieval performance by including additional terms in a user query. The pseudo relevance feedback (PRF) method employed for AQE so far has suffered from a major problem of query drift. Therefore, keeping it in view, a new hybrid document clustering for PRF based AQE approach is proposed in the present article. In this, Fuzzy logic and Particle Swarm Optimization (PSO) are used to construct document clusters. Further, a new and effective hybrid PSO and Fuzzy logic-based term weighting approach is followed to find more suitable additional query terms using a weighted score of four IR evidences which is considered maximized. Moreover, a combined semantic filtering method along with query terms re-weighting algorithms are also used to remove noisy or irrelevant terms semantically. The performance of the presented approaches in this article is tested and compared with other approaches on three benchmark data sets. The comparative analysis of all the tested approaches illustrates the superior performance of the proposed approach. KEywoRDS Automatic Query Expansion, Document Clustering, F-Measure, Fuzzy Logic, Particle Swarm Optimization, Precision, Pseudo Relevance Feedback, Recall INTRoDUCTIoN Pseudo Relevance Feedback based Automatic Query Expansion methods (Attar et al., 1977; Buckley et al., 1995; Lavrenko et al., 2001; Robertson et al., 1996) are established on a supposition that the top extracted documents are relevant to find suitable terms from query expansion techniques. It is usually expected in all the Information Retrieval (IR) models that the top extracted documents may contain noise (Gupta et al., 2017). This problem may cause query expansion to ‘‘drift’’ away from original query. Another problem with IR is the size of dataset. Nowadays, the size of datasets is being increased exponentially and to extract the relevant documents from these huge datasets has become a challenging task. These problems may be overcome by document clustering. Clustering algorithms are unsupervised learning tools, which categorize documents into different clusters such that similar types of documents (objects) are grouped into same clusters. In this way, search space to retrieve relevant documents is reduced. The top retrieved documents after clustering contain less noise as compared to un-clustered document-based query expansion using PRF. Therefore, a new hybrid document clustering and PRF based AQE is proposed in this paper for text document retrieval. An