JOURNALOF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(8):1195–1209, 2008
Received April 26, 2007; revised July 6, 2007; accepted November 12, 2007
© 2008 ASIS&T
•
Published online 24 March 2008 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/asi.20827
Despite the rapid growth of text-based computer-mediated
communication (CMC), its limitations have rendered the
media highly incoherent. This poses problems for con-
tent analysis of online discourse archives. Interactional
coherence analysis (ICA) attempts to accurately identify
and construct CMC interaction networks. In this study,
we propose the Hybrid Interactional Coherence (HIC)
algorithm for identification of web forum interaction.
HIC utilizes a bevy of system and linguistic features,
including message header information, quotations, di-
rect address, and lexical relations. Furthermore, several
similarity-based methods including a Lexical Match
Algorithm (LMA) and a sliding window method are utilized
to account for interactional idiosyncrasies. Experiments
results on two web forums revealed that the proposed
HIC algorithm significantly outperformed comparison
techniques in terms of precision, recall, and F-measure
at both the forum and thread levels. Additionally, an
example was used to illustrate how the improved ICA
results can facilitate enhanced social network and role
analysis capabilities.
Introduction
Computer-Mediated Communication (CMC) is any form
of communication between two or more individuals who
interact and influence each other via computer-supported
media. Text-based modes of CMC include e-mail, listservs,
forums, chatrooms, instant messaging, and the World Wide
Web (Herring, 2002). There is no doubt that the popularity
of CMC is continuing to grow. E-mail, Web forums, news-
groups, and chatrooms have already become essential parts
of our daily lives, providing a communication medium for
various activities (Meho, 2006; Radford, 2006). Although
the ubiquitous nature of CMC provides a convenient
mechanism for communication, it is not without its short-
comings. The fragmented, ungrammatical, and interaction-
ally disjointed nature of CMC discourse, attributable to the
limitations of the CMC media, has rendered CMC highly
incoherent (Hale, 1996).
Beaugrande and Dressler (1996) defined coherence in
linguistics as a “continuity of senses” and “the mutual access
and relevance within a configuration of concepts and rela-
tions.” For Web discourse, coherence defines the macro-level
semantic structure (Barzilay & Elhadad, 1997). Barzilay and
Elhadad further pointed out that “coherence is represented in
terms of coherence relations between text segments, such as
elaboration, cause and explanation.” Coherence of online
discourse, correspondingly, is represented in terms of the
reply-to relations between CMC messages. The reply-to rela-
tionships can serve several functions, such as elaborating or
complementing previous postings, greeting fellow users,
answering questions, or oppugning previous messages.
Computer-Mediated Interaction (CMI) refers to the social
interaction between CMC users (Walther, Anderson, & Park,
1994). Such social interaction is built through the reply-to
relationships between messages. Therefore, we also refer
to the reply-to relationship as the interaction relationship
between messages. A social interaction in online discourse
happens if a user posts a message that has a reply-to relation
with other users’ messages. Occasionally, a user may inter-
act with other users without specifying the messages he or
she responds to. Common greeting messages like “Hi Jatt”
are examples. But we can build fake reply-to relationships
between such messages with the addressed user’s nearest
message. This method does not affect the social interaction
relationships between the users.
Since the reply-to relations between CMC messages can
be used to build the social interaction between users, coher-
ence of CMC is also called CMC interactional coherence in
previous studies (e.g., Herring, 1999). However, current
CMC media suffer the “disrupted turn adjacency” problem
and the existed system functionalities do not contain suffi-
cient reply-to information. In light of the incoherent and frag-
mented nature of text-based Web discourse, many researchers
have pointed out the importance of automatically identifying
CMC interactional coherence. Te’eni (2001) claimed that inter-
actional coherence information is particularly important “when
A Hybrid Approach to Web Forum Interactional
Coherence Analysis
Tianjun Fu, Ahmed Abbasi, and Hsinchun Chen
Artificial Intelligence Lab, Department of Management Information Systems, The University of Arizona, Tucson,
AZ 85721. E-mail: futj@email.arizona.edu, aabbasi@email.arizona.edu, hchen@eller.arizona.edu