S.-K. Chai, J.J. Salerno, and P.L. Mabry (Eds.): SBP 2010, LNCS 6007, pp. 180–188, 2010.
© Springer-Verlag Berlin Heidelberg 2010
COLBERT: A Scoring Based Graphical Model for
Expert Identification
Muhammad Aurangzeb Ahmad and Xin Zhao
Department of Computer Science and Engineering, University of Minnesota
mahmad@cs.umn.edu, zhao0111@umn.edu
Abstract. In recent years a number of graphical models have been proposed for
Topic discovery in various contexts and network analysis. However there is one
class of document corpus, documents with ratings, where the problem of topic
discovery has not been explored in much detail. In such document corpuses
reviews and ratings of documents in addition to the documents themselves are
also available. In this paper we address the problem of discovery of latent
structures in document-review corpus which can then be used to construct a
social network of experts. We present a graphical model COLBERT that
automatically discovers latent topics based on the contents of the document, the
review of the document and the ratings of the review.
Keywords: Expert Identification, Topic Modeling, COLBERT.
1 Introduction
Graphical Models for discovering latent structure in document corpora has been
applied in a number of different settings. Thus given a document corpus one can
discover latent topics in the corpus based on the content of the documents and
additional information if it is available. The relationships that are discovered can be
one to one (author and document), one to many (multiple authors for the same topic)
or many to many (multiple authors for multiple topics). Examples of such
relationships and their respective contexts include document corpus data[21][5][13]
and e-mail datasets[12]. In this paper we present a graphical model, the COLBERT
(COrolated Latent BEhavior Related Topic) Model, which discovers latent topics in
document categories by taking into account document reviewers and the ratings of the
reviews. By taking into account ‘groups’ formed by the reviewers we also exploit the
social network for the topic discovery task. We use the epinions dataset which
consists of product reviews and ratings of the reviews. We note that product category
discovery is analogous to topic discovery in a document corpus.
We use the epinions dataset for our experiments. Epinions is a website which
contains information about a large number of products ranging from books to movies
to software. These products are grouped together into categories, sub-categories and
super-categories. Users of epinions can post reviews of these products. The quality of
these reviews can be further evaluated by other reviewers. Additionally the website
stores a wide range of information related to the products and the users e.g., user’s