International Journal of Computer Applications (0975 – 8887) Volume 49– No.15, July 2012 6 Automatic File Indexing Framework An Effective Approach to Resolve Dangling File Pointers Yasas Diniesha Jayaweera Department of Information Technology Sri Lanka Institute of Information Technology ABSTRACT Today managing files in a server system has the same magnitude as managing the World Wide Web due to the dynamic nature of the file system. Even searching for files over the file system is time consuming because finding a file on hard disk is a long-running task. Every file on the disk has to be read with dangling pointers to files which no longer exist because they have been changed, moved or deleted. This makes the user frustrated. The Automatic file indexing framework facilitates users to resolve file names and locate documents stored in file repositories. The main design objective of the framework is to maintain sub-indexes at the folder level that have the full knowledge of the revisions that are made at the folder level automatically. This research proposes a framework that manages the creation and maintenance of the file index, with the use of Resources Description Framework (RDF) and retrieval using semantic query languages i.e. SPARQL. The sub-indexes are maintained hierarchically starting from the leaf node to the root node recursively. The proposed framework will monitor the file system continuously and update individual folder descriptors (sub-indexes) stored on each node as the file system changes making the cached indexes resilient to any file changes. The framework is resilient of file or folder name changes. Further, the study explores avenues to build an offline semantic index that can be used by the clients to perform distribute file search without performing the search on the server itself. This is viable since the framework uses semantic languages to describe and build file descriptors that can easily integrate semantic indexing and hence this makes the index readily available for the Web. General Terms File Indexing, File Retrieval, Knowledge Management, Semantic Web Technologies. Keywords File Indexing, Document retrieval, Semantic Web, RDF (Resource Description Framework), SPARQL. 1. INTRODUCTION Today, there is an enormous number of files stored on Web servers and even on Personal Computers (PC). The PC has now become a Web of files. The growth of the file system is exponential. Yet, it is a herculean task to locate the stored files manually in a timely manner. To reduce the time spent on manual finding there are search tools which facilitate a smart finding of files to users. In the context of an application to locate files it uses a file index. There are many index creation tools to avoid user frustration. Manually retrieving a file is a tedious task unless one can remember the location of the files stored. Due to the rapid increase in files in the system it is the job of the file indexing tools to provide an interface to applications which help locating and retrieving the files the user wants. Current tools in the market improve searching via manual or automatic indexing which facilitates fast retrieval of the files but dangling file pointers still exist making it difficult to locate the file when the location or name changes. Most windows and Web based applications index recently used file entries as a quick reference for the users. But when a file is moved, changed or deleted the entries in the index remain unchanged leaving the entries dangling and pointing to null references leading to user frustration. This is due to the absence of a central index or because of an obsolete index. Either type will be of no use to the user since the user has no clue of retrieving the file if the file has been moved or the path has been changed. Building an index from scratch takes a lot of time depending on the size of the file system. After the index is built the index should be up to date by incorporating file system changes. Unless it is updated in a timely manner the index becomes obsolete. There are many indexing tools existing in a system but they do not survive when the file is moved, changed or deleted. Tools like Windows 7 search and index, facilitate users to find what is necessary via its search interface. The search result can be indexed for later retrieval by storing the query used to search the files but there is no metadata involved in maintaining the file system changes. Therefore, the index is unaware of the changes to the file system. As such, the problem persists when the file system changes. The stored query retrieves the file system with new changes but the application may refer to the old index references making the changed entries to dangle around. This can happen due to one of the following reasons: file has been deleted, file name has been changed, file has been moved or file path has been changed. If the index system does not maintain metadata about the file changes it will not be able to identify what has caused the system to fail while retrieving the file. For instance if a file name has been changed then the application uses the previously indexed, old symbolic file name which does not exist in the current file system. At a glance one can say the indexes help applications to locate what the user really wants but without maintaining metadata about file changes the index becomes obsolete where it cannot locate files once the file system has changed. The primary objective of the proposed framework is to trace path and file changes in a metadata file stored locally in each folder in the folder hierarchy. It helps applications to locate a file if the file exists. So that the user does not have to remember file name or file location changes making the file retrieval system intelligent. The proposed framework will further incorporate file system changes automatically. That is; in the existing environment when files are copied from one location to another the index should be rebuilt manually to make the index up to date. But