Micro-blog Keyword Extraction Method Based on Graph Model and Semantic Space Hua Zhao and Qingtian Zeng College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China Email: doctorhuazhao@yahoo.com.cn, qtzeng@163.com Abstract—There have been many domain-specific keyword extraction researches, but micro-blog- oriented keyword extraction is just beginning. This paper researches into the keyword extraction from Chinese micro-blog. Taking the characteristics of micro-blog into account, such as short, topic divergence, etc., we propose a Chinese micro-blog keyword extraction method based on the combination of multi features. Firstly create the graph model based on the co-occurrence between words, get a kind of weight based on the created graph model. The weight based on the graph model is sometimes same. In order to solve this problem, this method secondly proposes to create the semantic space based on the topic detection method, and get the statistical weight based on the semantic space. Finally, we take the location of words into account during the extraction, which is proved to be a very effective feature. Experimental results show that the proposed keyword extraction method is very successful. Index Terms—Micro-Blog, Keywords Extraction, Graph Model, Semantic Space I. INTRODUCTION Micro-blog is a social networking application which provides users with an information sharing, broadcast and acquisition platform [1]. Micro-blog helps users to connect with other micro-blog users around the globe. Micro-bloggers can write all kinds of information they are interested in on Micro-blog to share with others. Micro-blog is also a kind of short texts with the limitation of the length is 140 words. Now, more and more people begin to use micro-blog, and the micro-blog users are getting overwhelmed by the raw data. Many researchers carry out a lot of researches to overcome this problem. Researches about micro-blog have attracted increasing attentions from the researchers in the many fields, which include Natural Language Processing (NLP), Communication, and so on. Keyword extraction is a subtask of information extraction, with the goal to automatically extract relevant terms from a given corpus. Key word extraction plays an important role in many Natural Language Processing researches [2], and is a basic work for the text classification, text clustering and so on. Now, although there have existed many researches about the keyword extraction, but the keyword extraction from micro-blog is just beginning, especially from Chinese micro-blog. In this paper, we carry out the Chinese micro-blog keyword extraction, where the keyword in this paper is defined to be the words which can represent the content of the micro-blog. The extracted keywords can be used in many aspects, for example, user interest modeling, and hot topic tracking, and so on. The emphasis of our work is how to extract the keyword effectively from a single micro-blog text. Taking the characteristics of the micro-blog, such as shorter length, topic divergence, we propose a keyword extraction method based on the fusion of multiple features, which include three features: graph model, statistical weight and location feature, where graph model is based on the textRank. Based on our foundation that the users usually public several pieces of micro-blog when they go to a place or take part in a certain party, and these pieces of micro-blog are related to the same topic, we propose to create the semantic space to compute the statistical weight. Experimental results show that the proposed method is very successful. The structure of the paper is as follows. Section 2 gives a short overview of related research. Section 3 presents the method to create the graph model and the word weight computation method based on the graph model. Section 4 covers the word weight computation method based on the semantic space. Section 5 gives the keyword extraction method based on the fusion of the multiple features. Section 6 discusses the experimental results and analysis. Section 6 gives the conclusions inferred from our work. II. RELATED WORK A. Related Work of Keyword Extraction The keyword is very important in the information retrieval, automatic summarization, so the keyword extraction has always been the hot topic of NLP. Researchers have researched into the extraction methods for many specific domains, for example, web texts [3], meeting transcripts [4] [5] and scientific publications [6], semantic annotations [7] and have made many achievements. Some other researchers carry out many interesting works based on the extracted keywords [8]-[9]. Overall, there are two kinds of methods [10]: supervised methods and unsupervised methods. The main idea of the former is to train a keyword extraction model based on the part of speech, location, and so on. And then use the model to extract the keywords from the micro- JOURNAL OF MULTIMEDIA, VOL. 8, NO. 5, OCTOBER 2013 611 © 2013 ACADEMY PUBLISHER doi:10.4304/jmm.8.5.611-617