In Submission. Do Not Quote or Cite Without Prior Permission From Authors. Communicating with Action Darren Gergle Human-Computer Interaction Institute Carnegie Mellon University 5000 Forbes Avenue +1 412 268-7418 dgergle+@cs.cmu.edu Robert E. Kraut Human-Computer Interaction Institute Carnegie Mellon University 5000 Forbes Avenue +1 412 268-7694 robert.kraut@cmu.edu Susan R. Fussell Human-Computer Interaction Institute Carnegie Mellon University 5000 Forbes Avenue +1 412 268-4003 susan.fussell@cmu.edu ABSTRACT A shared visual workspace allows multiple people to see similar views of objects and environments. Prior empirical literature demonstrates that visual information helps collaborators understand the current state of their task and enables them to communicate and ground their conversations efficiently. We present an empirical study that demonstrates how visual information can replace, modify or augment speech in a shared visual environment. Pairs performed a referential communication task with and without a shared visual space. A detailed sequential analysis of the communicative content reveals that pairs with a shared workspace were less likely to explicitly verify their actions with speech. Rather, they relied on visual information to provide the necessary communicative and coordinative cues. Categories and Subject Descriptors H.5.3 [Information Interfaces and Presentation]: Group and Organization Interfaces – collaborative computing, computer- supported cooperative work. General Terms Design, Experimentation, Human Factors, Performance, Theory. Keywords Shared visual space, empirical studies, sequential analysis, language, and communication. 1. INTRODUCTION A good portion of technology development for CSCW tacitly assumes that the primary goal is to support spoken language. For a large number of tasks, however, successful interaction does not rely solely on spoken language. Rather, communicative information can be provided in the form of linguistic utterances, visual feedback, gestures, acoustic signals, or a host of other sources; all of which play an important role in successful communication. Everyday communication requires conversants to integrate these elements in an extremely rapid, flexible, real-time and cooperative fashion. Speakers generate and monitor their own activities; however, they also monitor the language and actions of their partners and take both into account as they speak. Consider a group of architects, consultants and lay clients working together to discuss architectural plans for the design of a new corporate headquarters. Communication in the group is not merely composed of a series of individual utterances produced one at a time and presented for others to hear. Rather, speakers and addressees take into account what one another can see [36], they notice where one another’s attention is focused [1,6], point to objects in the space and say things like “that one” and “there” [4], make hand gestures, eye contact, facial expressions, and share knowledge about previously spoken discourse and behavioral actions [9]. Many observational studies have demonstrated this rich interplay between speech and action that takes place in collaborative interactions [5,23,37]. Previous research has demonstrated the value of shared views of a workspace for collaboration on physical tasks [20,21,25,29,30,31]. These studies have uniformly found that participants in side-by-side settings, in which they share full views of one another and the workspace, perform better than participants using communications tools. Several recent studies [18,20,21,30] have further shown that pairs perform better when they are using video tools that provide views of the workspace than when they are using audio or text-based communication alone. Recently, there has been growing interest in the design of tools to allow collaborators to remotely perform tasks such as architectural planning. These activities, which we call collaborative physical tasks, involve intricate dependencies between verbal communication and physical actions. Telemedicine applications, remote repair systems, and collaborative design technologies are all examples of collaborative physical tasks. Any successful CSCW tool for remote collaboration on physical tasks will need to support the dependencies between speech and action found in these tasks. To build tools that support collaborative physical tasks at a distance, we need a better understanding of the mechanisms through which the presence of a shared view of a workspace improves task performance. Although early research was satisfied in assessing whether the presence of a shared visual space affected the quality of task performance, recent research has begun to fill in the details (see [11,12] for recent efforts in this direction). How, for example, does seeing a partner’s gaze or actions alter a person’s behavior? How does awareness that one is being watched influence one’s own behavior? Understanding the mechanisms by which visual information affects communication is essential for designing systems to support remote collaboration on physical tasks. By identifying how visual information and speech can influence and substitute for one another, we can make informed decisions about when and how to provide this visual information in CSCW tools. This paper tests the hypothesis that a shared view of a workspace allows a pair completing a physical task to substitute action for language, and thus give instruction more efficiently. In the process of viewing whether a worker has completed an instruction correctly, an instructor also receives as a side-effect, accurate In Submission. Do Not Quote or Cite Without Prior Permission From Authors.