Micro-blog Keyword Extraction Method Based
on Graph Model and Semantic Space
Hua Zhao and Qingtian Zeng
College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590,
China
Email: doctorhuazhao@yahoo.com.cn, qtzeng@163.com
Abstract—There have been many domain-specific keyword
extraction researches, but micro-blog- oriented keyword
extraction is just beginning. This paper researches into the
keyword extraction from Chinese micro-blog. Taking the
characteristics of micro-blog into account, such as short,
topic divergence, etc., we propose a Chinese micro-blog
keyword extraction method based on the combination of
multi features. Firstly create the graph model based on the
co-occurrence between words, get a kind of weight based on
the created graph model. The weight based on the graph
model is sometimes same. In order to solve this problem,
this method secondly proposes to create the semantic space
based on the topic detection method, and get the statistical
weight based on the semantic space. Finally, we take the
location of words into account during the extraction, which
is proved to be a very effective feature. Experimental results
show that the proposed keyword extraction method is very
successful.
Index Terms—Micro-Blog, Keywords Extraction, Graph
Model, Semantic Space
I. INTRODUCTION
Micro-blog is a social networking application which
provides users with an information sharing, broadcast and
acquisition platform [1]. Micro-blog helps users to
connect with other micro-blog users around the globe.
Micro-bloggers can write all kinds of information they
are interested in on Micro-blog to share with others.
Micro-blog is also a kind of short texts with the limitation
of the length is 140 words. Now, more and more people
begin to use micro-blog, and the micro-blog users are
getting overwhelmed by the raw data. Many researchers
carry out a lot of researches to overcome this problem.
Researches about micro-blog have attracted increasing
attentions from the researchers in the many fields, which
include Natural Language Processing (NLP),
Communication, and so on.
Keyword extraction is a subtask of information
extraction, with the goal to automatically extract relevant
terms from a given corpus. Key word extraction plays an
important role in many Natural Language Processing
researches [2], and is a basic work for the text
classification, text clustering and so on. Now, although
there have existed many researches about the keyword
extraction, but the keyword extraction from micro-blog is
just beginning, especially from Chinese micro-blog. In
this paper, we carry out the Chinese micro-blog keyword
extraction, where the keyword in this paper is defined to
be the words which can represent the content of the
micro-blog. The extracted keywords can be used in many
aspects, for example, user interest modeling, and hot
topic tracking, and so on.
The emphasis of our work is how to extract the
keyword effectively from a single micro-blog text.
Taking the characteristics of the micro-blog, such as
shorter length, topic divergence, we propose a keyword
extraction method based on the fusion of multiple
features, which include three features: graph model,
statistical weight and location feature, where graph model
is based on the textRank. Based on our foundation that
the users usually public several pieces of micro-blog
when they go to a place or take part in a certain party, and
these pieces of micro-blog are related to the same topic,
we propose to create the semantic space to compute the
statistical weight. Experimental results show that the
proposed method is very successful.
The structure of the paper is as follows. Section 2 gives
a short overview of related research. Section 3 presents
the method to create the graph model and the word
weight computation method based on the graph model.
Section 4 covers the word weight computation method
based on the semantic space. Section 5 gives the keyword
extraction method based on the fusion of the multiple
features. Section 6 discusses the experimental results and
analysis. Section 6 gives the conclusions inferred from
our work.
II. RELATED WORK
A. Related Work of Keyword Extraction
The keyword is very important in the information
retrieval, automatic summarization, so the keyword
extraction has always been the hot topic of NLP.
Researchers have researched into the extraction methods
for many specific domains, for example, web texts [3],
meeting transcripts [4] [5] and scientific publications [6],
semantic annotations [7] and have made many
achievements. Some other researchers carry out many
interesting works based on the extracted keywords [8]-[9].
Overall, there are two kinds of methods [10]:
supervised methods and unsupervised methods. The main
idea of the former is to train a keyword extraction model
based on the part of speech, location, and so on. And then
use the model to extract the keywords from the micro-
JOURNAL OF MULTIMEDIA, VOL. 8, NO. 5, OCTOBER 2013 611
© 2013 ACADEMY PUBLISHER
doi:10.4304/jmm.8.5.611-617