JOURNALOF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(8):1195–1209, 2008 Received April 26, 2007; revised July 6, 2007; accepted November 12, 2007 © 2008 ASIS&T Published online 24 March 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asi.20827 Despite the rapid growth of text-based computer-mediated communication (CMC), its limitations have rendered the media highly incoherent. This poses problems for con- tent analysis of online discourse archives. Interactional coherence analysis (ICA) attempts to accurately identify and construct CMC interaction networks. In this study, we propose the Hybrid Interactional Coherence (HIC) algorithm for identification of web forum interaction. HIC utilizes a bevy of system and linguistic features, including message header information, quotations, di- rect address, and lexical relations. Furthermore, several similarity-based methods including a Lexical Match Algorithm (LMA) and a sliding window method are utilized to account for interactional idiosyncrasies. Experiments results on two web forums revealed that the proposed HIC algorithm significantly outperformed comparison techniques in terms of precision, recall, and F-measure at both the forum and thread levels. Additionally, an example was used to illustrate how the improved ICA results can facilitate enhanced social network and role analysis capabilities. Introduction Computer-Mediated Communication (CMC) is any form of communication between two or more individuals who interact and influence each other via computer-supported media. Text-based modes of CMC include e-mail, listservs, forums, chatrooms, instant messaging, and the World Wide Web (Herring, 2002). There is no doubt that the popularity of CMC is continuing to grow. E-mail, Web forums, news- groups, and chatrooms have already become essential parts of our daily lives, providing a communication medium for various activities (Meho, 2006; Radford, 2006). Although the ubiquitous nature of CMC provides a convenient mechanism for communication, it is not without its short- comings. The fragmented, ungrammatical, and interaction- ally disjointed nature of CMC discourse, attributable to the limitations of the CMC media, has rendered CMC highly incoherent (Hale, 1996). Beaugrande and Dressler (1996) defined coherence in linguistics as a “continuity of senses” and “the mutual access and relevance within a configuration of concepts and rela- tions.” For Web discourse, coherence defines the macro-level semantic structure (Barzilay & Elhadad, 1997). Barzilay and Elhadad further pointed out that “coherence is represented in terms of coherence relations between text segments, such as elaboration, cause and explanation.” Coherence of online discourse, correspondingly, is represented in terms of the reply-to relations between CMC messages. The reply-to rela- tionships can serve several functions, such as elaborating or complementing previous postings, greeting fellow users, answering questions, or oppugning previous messages. Computer-Mediated Interaction (CMI) refers to the social interaction between CMC users (Walther, Anderson, & Park, 1994). Such social interaction is built through the reply-to relationships between messages. Therefore, we also refer to the reply-to relationship as the interaction relationship between messages. A social interaction in online discourse happens if a user posts a message that has a reply-to relation with other users’ messages. Occasionally, a user may inter- act with other users without specifying the messages he or she responds to. Common greeting messages like “Hi Jatt” are examples. But we can build fake reply-to relationships between such messages with the addressed user’s nearest message. This method does not affect the social interaction relationships between the users. Since the reply-to relations between CMC messages can be used to build the social interaction between users, coher- ence of CMC is also called CMC interactional coherence in previous studies (e.g., Herring, 1999). However, current CMC media suffer the “disrupted turn adjacency” problem and the existed system functionalities do not contain suffi- cient reply-to information. In light of the incoherent and frag- mented nature of text-based Web discourse, many researchers have pointed out the importance of automatically identifying CMC interactional coherence. Te’eni (2001) claimed that inter- actional coherence information is particularly important “when A Hybrid Approach to Web Forum Interactional Coherence Analysis Tianjun Fu, Ahmed Abbasi, and Hsinchun Chen Artificial Intelligence Lab, Department of Management Information Systems, The University of Arizona, Tucson, AZ 85721. E-mail: futj@email.arizona.edu, aabbasi@email.arizona.edu, hchen@eller.arizona.edu