An Efficient Algorithm for Topic Ranking and Modeling Topic Evolution Kumar Shubhankar, Aditya Pratap Singh, Vikram Pudi Center for Data Engineering, International Institute of Information Technology, Hyderabad, India {shubankar, aditya_pratap}@students.iiit.ac.in, vikram@iiit.ac.in Abstract. In this paper we introduce a novel and efficient approach to detect and rank topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection and topic ranking has become a challenging task. We present a unique approach that uses closed frequent keyword- set to form topics. We devise a modified time independent PageRank algorithm that assigns an authoritative score to each topic by considering the sub-graph in which the topic appears, producing a ranked list of topics. The use of citation network and the introduction of time invariance in the topic ranking algorithm reveal very interesting results. Our approach also provides a clustering technique for the research papers using topics as similarity measure. We extend our algorithms to study various aspects of topic evolution which gives interesting insight into trends in research areas over time. Our algorithms also detect hot topics and landmark topics over the years. We test our algorithms on the DBLP dataset and show that our algorithms are fast, effective and scalable. Keywords: Closed Frequent Keyword-set, Topic Ranking, Citation Network, Authoritative Score, Evolution 1 Introduction The ever growing size of academic literature and fast changing fields of research pose a challenging task for a researcher to identify significant topics of research over the timeline. Topic discovery has recently attracted considerable research interest [13], [14], [15]. In this paper, we propose a novel and efficient method to detect and rank research topics. Based on the intuition that a document is well summarized by its title and the title gives a good high-level description of its content, we use the keywords present in the title of a paper to detect the topics. We form closed frequent keyword-sets as topics from the phrases present in the titles of papers on a user-defined minimum support. We propose a time independent, modified iterative PageRank [3] algorithm to assign an authoritative score to the papers. For a topic T, we consider all the research papers containing that topic and the citation edges of these papers. We then assign an authoritative score to each topic using the scores of the papers containing that topic. Our topic ranking algorithm is able to rank the topics based on their significance in research community rather than popularity of the topics, which only considers frequency of topics. All the papers sharing a topic form a natural cluster. It is to be noted that a paper could belong to a number of clusters forming hierarchical, overlapping clusters. Considering the topics on year-wise granularity, we modeled the evolution of topics on timeline. We apply the evolution of the topics for First Topic Detection, finding Landmark Topics and Fading Topics. Our algorithms have many applications like topic recommendation systems for authors, trend analysis, topic search systems etc. We tested our algorithms on the DBLP dataset. Our experiments produced a ranked set of topics that on examination by field experts and based on our study match the prominent topics in the dataset over the timeline.