S.-K. Chai, J.J. Salerno, and P.L. Mabry (Eds.): SBP 2010, LNCS 6007, pp. 180–188, 2010. © Springer-Verlag Berlin Heidelberg 2010 COLBERT: A Scoring Based Graphical Model for Expert Identification Muhammad Aurangzeb Ahmad and Xin Zhao Department of Computer Science and Engineering, University of Minnesota mahmad@cs.umn.edu, zhao0111@umn.edu Abstract. In recent years a number of graphical models have been proposed for Topic discovery in various contexts and network analysis. However there is one class of document corpus, documents with ratings, where the problem of topic discovery has not been explored in much detail. In such document corpuses reviews and ratings of documents in addition to the documents themselves are also available. In this paper we address the problem of discovery of latent structures in document-review corpus which can then be used to construct a social network of experts. We present a graphical model COLBERT that automatically discovers latent topics based on the contents of the document, the review of the document and the ratings of the review. Keywords: Expert Identification, Topic Modeling, COLBERT. 1 Introduction Graphical Models for discovering latent structure in document corpora has been applied in a number of different settings. Thus given a document corpus one can discover latent topics in the corpus based on the content of the documents and additional information if it is available. The relationships that are discovered can be one to one (author and document), one to many (multiple authors for the same topic) or many to many (multiple authors for multiple topics). Examples of such relationships and their respective contexts include document corpus data[21][5][13] and e-mail datasets[12]. In this paper we present a graphical model, the COLBERT (COrolated Latent BEhavior Related Topic) Model, which discovers latent topics in document categories by taking into account document reviewers and the ratings of the reviews. By taking into account ‘groups’ formed by the reviewers we also exploit the social network for the topic discovery task. We use the epinions dataset which consists of product reviews and ratings of the reviews. We note that product category discovery is analogous to topic discovery in a document corpus. We use the epinions dataset for our experiments. Epinions is a website which contains information about a large number of products ranging from books to movies to software. These products are grouped together into categories, sub-categories and super-categories. Users of epinions can post reviews of these products. The quality of these reviews can be further evaluated by other reviewers. Additionally the website stores a wide range of information related to the products and the users e.g., user’s