Contextual Correlation of Concepts in a Web Repository Testbed S.Sarasvady, P.Pichappan Department of Library & Information Science, Annamalai University Annamalainagar 608002. TN India Saras_vady@yahoo.com P.Vijayakumar Sri Venkateswara College of Engineering Sriperumpudur 602 105.TN India vijai@svce.ac.in ppich@vsnl.net Abstract The online information store is at present the undiscriminating medium fails to distinguish between information and trivial knowledge. If the online world were the context and content- based manipulation of text underpinning many of the distributed, knowledge-intensive applications, it would accurately resolve the senses of concept in the heterogeneous corpus. 1. Introduction In the print environment, human processors who understand well how a concept is addressed in a paper, do context specification. The human information processing addresses how the shared structure of knowledge about subject domains corresponds to the personal semantic and syntactic structures that users have in their minds. In the web environment, pages are largely indexed by crawlers, which do not understand the treatment of concepts. Indexing of information and research on it, are the traditional activities to the people who have interest in information processing. However, the evolution of web poses challenges and threat to online information processing. Online information processing and retrieval research still is an experimental science. The volume of research and knowledge produced in the world of information processing has been expanding in sheer volume that corresponds to the growth of online information. The central issue in the corpus of information processing is to recognise the central problem of understanding and measuring the relevance between the queries and information store and ‘within the information store’. The web repository is characterised by chaotic and undiscriminating a medium to be entrusted with the communication and archiving of the substantive ideas and findings. 2. Background In the online environment, a large collection of files is co-retrieved against a query, but they do not semantically relate to each other very well. Two concepts are considered semantically similar if they tend to co-occur in the same context in distributed files and structures. While it is easy to understand this assumption, problems emerge while incorporating this features in reality. In many instances, the words do not express the concepts ideally and the mere co-occurrence of words do not ensure concept occurrence. It has been observed that two people use the same term to describe the same concept in less than 20% of the cases[1][Deerwester et al. 1990]. Creating semantic relationships needs to employ special-purpose heuristics to be correct, since the online content is unstructured objects. Information processing research can accurately resolve the senses of words in a large heterogeneous corpus. Attempts have been made in the traditional information science as well as in the modern processing systems how the context of the concepts can be preserved while processing. Algorithms have been deployed to ensure semantic features in indexing. As the context preservation is mainly on linguistic considerations, research has been focusing on all these considerations for enhancing the context specificity. If two words co-occur within a particular contextual range such as adjacent positions with high frequency, in a web page or in online collection, the pair of words is expected to have identical distribution pattern within that contextual range in the online collection. When Page 533 EurAsia-ICT 2002, Shiraz-Iran, 29-31 Oct.