IRJMST Vol 7 Issue 2 [Year 2016] ISSN 2250 1959 (0nline) 2348 9367 (Print) International Research Journal of Management Science & Technology http://www.irjmst.com Page 19 RANK BASED CONTENT MINING ANKITA GUPTA ABSTRACT Finding relevant data or finding what an individual actually needs from the bulk of data is a very big task. For a new comer in the field of research it‟s very difficult to reach to the required and relevant results. While searching for research papers on internet, one requires knowing either the authors of the paper or the exact title of the paper. It‟s not possible to get appropriate results if a person have only basic knowledge of the topic or do not exactly know the author name. So we have proposed a ranking based search in this paper to get the most relevant results. KEYWORDS: stemming, ranking, ATM, TAM INTRODUCTION With the advent of the Web and various specialized digital libraries, the automatic extraction of useful information from text has become an increasingly important research area in web mining. Fig 1.1 Areas of web mining There are three areas of Web mining according to the usage of the Web data used as input in the data mining process, namely, Web Content Mining (WCM), Web Usage Mining(WUM) and Web Structure Mining (WSM). Web Content Mining (WCM) Web Content Mining is the process of extracting useful information from the contents of web documents. The web documents may consists of text, images, audio, video or structured records like tables and lists. Mining can be applied on the web documents as well the results pages produced from a search engine. Libraries are now facing a crucial transitional state, which necessitates adopting modern methods of knowledge organization. The advent of Information Technology offers a powerful means of translating intellectual contributions into value-added information products, which satisfies the specific requirements of each end user. The implications of this phenomenon have reflected a gradual transformation of the physical format of information resources in Libraries. The concept of multimedia Library was introduced when audio-visual materials were added along with the printed media. The next stage has been marked by the digitalization of information and the digital library came into vogue. Digital library does not mean merely converting the information into digital form but it is an asset, which facilitates free flow of exchange of information at global level through Internet. This led to the metamorphosis of library into virtual library. 1.1 MOTIVATION The inspiration for this project comes particularly from CiteSeerX, a digital library and repository for scientific and academic papers with a focus on computer and information science. However, there are also other interesting online academic literature repositories, such as the local Highwire Press, an offshoot of the Stanford libraries and Google Scholar. The proposed system will be very useful to the researchers as initially they are not aware of the precise research areas to search upon and the associated papers where they can work on, and also the sequencing of the papers of an author on the same field is not known, so this system assists the researcher in searching the papers that they are interested to search. 1.2 PROBLEM STATEMENT Our project aims at describing the various relations between topic and author-name within the documents. We are looking for a