Learning Online Discussion Structures by Conditional Random Fields Hongning Wang, Chi Wang, ChengXiang Zhai, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL, 61801 USA {wang296, chiwang1, czhai, hanj}@cs.uiuc.edu ABSTRACT Online forum discussions are emerging as valuable infor- mation repository, where knowledge is accumulated by the interaction among users, leading to multiple threads with structures. Such replying structure in each thread conveys important information about the discussion content. Un- fortunately, not all the online forum sites would explicitly record such replying relationship, making it hard to for both users and computers to digest the information buried in a discussion thread. In this paper, we propose a probabilistic model in the Con- ditional Random Fields framework to predict the replying structure for a threaded online discussion. Different from previous replying relation reconstruction methods, most of which fail to consider dependency between the posts, we cast the problem as a supervised structure learning problem to incorporate the features capturing the structural depen- dency and learn their relationship. Experiment results on three different online forums show that the proposed method can well capture the replying structures in online discussion threads, and multiple tasks such as forum search and ques- tion answering can benefit from the reconstructed replying structures. Categories and Subject Descriptors I.5.1 [Pattern Recognition]: Models - Statistical General Terms Algorithms, Measurement, Experimentation Keywords Threaded Discussion, Replying Relation Reconstruction, Struc- ture Learning 1. INTRODUCTION As the development of Web 2.0, more and more people take the advantage of online forum discussions to freely Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR’11, July 24–28, 2011, Beijing, China. Copyright 2011 ACM 978-1-4503-0757-4/11/07 ...$10.00. share and exchange their mind and knowledge. Valuable knowledge and information on various topics, e.g., sports, health, entertainment and etc., have been accumulated by this collaborative content contribution. More importantly, such knowledge can hardly be found in general web sites and encyclopedia, making forums a unique and valuable resource for extracting useful knowledge to facilitate other informa- tion seeking tasks, including forum search [2, 13], question answering [4, 6] and expert finding [21, 8]. deesto Jan 6, 2011 11:06 AM I see lots of new complaints here about system slowness, apps not working, etc., but after updating my MacBook Pro from 10.6.5 to 10.6.6, I can no longer boot into OS X. 0 0 a brody Jan 6, 2011 12:59 PM Never upgrade a production machine without a backup. Unfortunately you can forget about the presentation. First step is to recover: http://www.macmaps.com/backup.html #RECOVER 1 deesto Jan 6, 2011 2:08 PM Hi a brody, and thank you for responding. I’m not sure from where you made this assumption, but of course I keep data back-ups; and I’m not sure what you classify as a "production machine" 2 Frank Miller2 Jan 6, 2011 2:19 PM I suggest you start this machine in ’target disk’ mode - shut it down, then restart it with the ’T’ key held down while it is connected to another Mac with a FireWire cable. 3 deesto Jan 6, 2011 2:29 PM Thanks Frank. But I really only have one Mac: this one. My personal files are not at risk: I have backups, and obtaining the files off of the machine is not a problem. 4 1 2 3 4 Time line Figure 1: A sample threaded discussion from Apple Discussions A typical online forum discussion originates from a root post, which initializes a topic for the following discussions, e.g., system failure in Mac OS X in Apple discussions as shown in Figure 1. The followers read existing messages and reply to the post they are informed of or most interested in. From temporal perspective, those replying posts form a chain structure, or thread (as shown in the time line in Figure 1). Replying posts can reply to any preceding post, forming branches of discussion as more users are joining in and making comments. As a result, the discussion thread grows and forms a tree structure from semantic perspective: one post has only one “reply-to” post, while one post can be replied to by multiple posts. The semantic tree structure is helpful for both human to digest the discussion content and automatic method to ex-