Integrating cyberinfrastructure into existing e-Social Science research Svenja Adolphs, 1 Bennett Bertenthal, 2 Steve Boker, 3 Ronald Carter, 1 Chris Greenhalgh, 4 Mark Hereld, 5,6 Sarah Kenny, 5 Gina-Anne Levow, 7 Michael E. Papka, 5,6,7 Tony Pridmore 4 1 Centre for Research in Applied Linguistics, School of English, Nottingham University, UK 2 Department of Psychology, Indiana University, USA 3 Department of Psychology, University of Virginia, USA 4 School of Computer Studies and IT, Nottingham University, UK 5 Computation Institute, Argonne National Laboratory and The University of Chicago, USA 6 Mathematics and Computer Science Division, Argonne National Laboratory, USA 7 Department of Computer Science, The University of Chicago, USA Email address of corresponding author: svenja.adolphs@nottingham.ac.uk Abstract. This study has been facilitated by an NSF/ESRC exchange programme between researchers at the University of Chicago and the University of Nottingham. At the University of Nottingham the National Centre for e-Social Science node seeks to explore, understand and demonstrate the salience of new forms of digital record as they emerge from and for e- Social Science. The Nottingham Multimodal Corpus (NMMC) is a corpus of multimodal data that marries established coding schemes with visual mark-up systems to foster a richer understanding of the embodied nature of language use and its manifold relations to the production of distinctive social contexts. At the University of Chicago, the NSF-funded Social Informatics Data Grid (SIDGrid) 1 is being built to enable researchers to collect real- time multimodal behaviour at multiple time scales. Multimedia data (voice, video, images, text, numerical) is stored in a distributed data warehouse that employs Web and Grid services to support data storage, access, exploration, annotation, integration, analysis, and mining of individual and combined data sets. With particular reference to the analysis of and markup of hand gestures in spoken discourse, this paper explores some basic steps in integrating cyberinfrastructure into existing e-Social Science research as an interdisciplinary team with perspectives from linguistics, psychology, data and computing systems, and machine analysis of multi-modal data. Introduction Information technology advances have already made it possible to develop multi-million word databases or ‘corpora’ of spoken conversation as well as software tools to analyse this 1 SIDGrid: sidgrid.ci.uchicago.edu