International Journal of INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING ISSN:2147-67992147-6799 www.ijisae.org Original Research Paper International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2022, 10(2s), 241245 | 241 Weighted Hashing-Based Capture Text Similarity Estimation with the Cross-Media Semantic Level Lamhot Naibaho 1 , Kuldeep Singh Kaswan 2 , Pankaja R 3 , Shrishailappa Patil 4 , Santosh Mitkari 5 , Dr. Vivek Nivruttirao Waghmare 6 Submitted: 20/08/2022 Accepted: 23/11/2022 Abstract: Web Mining is an emerging trend for the drastic advancement of the different data mining techniques. The web mining process comprises the sequence of operations that are comprises of the different languages those need to be processed effectively. The estimation of the similarity between the ontologies words and the sequences are computed. This paper proposed a Weighted Hashing Similarity Estimation (WHSE). The proposed WHSE model comprises of the weighted values for the estimated semantics. The computed semantics are updated in the hashing table for the estimation of the features in the variables. The proposed WHSE computes the similarity score for the extracted sematic word features in the ontology and computes the key words. The proposed WHSE model performance is comparatively examined with the existing technique. The measured recall, precision and accuracy value expressed that proposed WHSE achieves the 0.98 accuracy value for the semantic ontology. The comparative analysis expressed that proposed WHSE achieves the ~3% - 7% improvement than the existing technique for the semantic level. Keywords: Web Mining, Semantic Level, Weighted Hashing, Text, Similarity Value, Cross-Media 1. Introduction Web mining is the procedure of discovering patterns from the WWW, which is one of the applications of data mining techniques. Web mining techniques play a crucial role in dealing with the information overload problem in large-scale data collection [1]. Most notably, the Web mining technique belongs to the information retrieval process with the data mining technique to find the data patterns that are desired by the user [2]. Based on the requirement of different users, large, dynamic and unstructured new web pages are being developed continuously. Since then many technologies that include the web mining algorithm and many traditional data mining algorithms are employed to analyze a collection of large amount of data in the weblog [3]. By exploring the web pages and acquiring the required information accurately, Web mining improves the performance of the IR process. Web mining comprises of the three categories such as web usage, structure mining and content in web mining. Due to the dramatic growth of the web content, the IR has become a critical task in the real world. Moreover, the massive amount of data on the Internet also has turned the knowledge management and information access as extremely challenging tasks [4]. The keyword-based IR techniques are traditional and also the most popular tool to retrieve the relevant results from the WWW, even though, these techniques do not cope up in delivering the related information when there is a continuous evolution of the web data [5]. The keyword-based technique has to match the user query with the data on the web and retrieve the result which has one or more query terms specified by the user [6]. As a result, it delivers irrelevant information due to the inadequate information on the keywords of the user query. Thus, the search engine needs to understand the intention of the user query and the exact context of the query terms to improve the search accuracy [7]. An additional consideration of a semantic dimension to the conventional IR techniques that assist the search tool to provide the intelligent and relevant information from the massive amount of web data [8]. The semantic level based information retrieval system identifies the relevant keywords and also determines the meaning of the query terms, which facilitates the system to retrieve all the related information regarding the user query [9]. Moreover, the semantic similarity measure creates an impact on numerous potential data mining applications such as paraphrase recognition, Word Sense Disambiguation (WSD), document retrieval, malapropism detection, and text categorization. In day to day scenario the language keep on altered with the evolution of the new concept and trends in the time [10]. The dynamic change in the language exhibits the similarity in the certain aspects based on the estimated features. The dynamic updation of the concept and idea it is necessary to develop a effective scheme to derive significant results [11]. Through semantic analysis the linguistic resources are computed based on the different WordNet and Latent Semantic Analysis (LSA) for 1 Dr., S.Pd., M.Hum., Lecturer/Researcher, English Language Education Study Program, Faculty of Letters and Languages, Christian University of Indonesia, Email: lamhot.naibaho@uki.ac.id, 0000-0001-9893-7165 2 Professor, Department of CSE, Galgotias University, Greater Noida Uttar Pradesh, India. Email: kuldeep.kaswan@galgotiasuniversity.edu.in, Orcid Id-0000-0003- 0876-0330 3 Assistant Professor, Information Science and Engineering, Sri Venkateshwara College of Engineering, Bengaluru, India. Email: pankaja.ssu@gmail.com, 0000-0001-5752-8023 4 Professor, Computer, Vishwakarma Institute of Technology, Pune, India Email: patil.st@vit.edu, https://orcid.org/0000-0002-9440-3446 5 Assistant Professor, Mathematics, Bharati Vidyapeeth's College of Engineering for Women, Pune, India. Email: santosh.mitkari@bharatividyapeeth.edu, 0000-0002-4689-5044 6 Associate Professor, Information Technology, SITRC, Nashik, India Email: dr.vnwaghmare@gmail.com