Use of Large Databases for Group Projects at the Nexus of Teaching and Research Richard C Thomas Computer Science & Software Engineering The University of Western Australia M002, 35 Stirling Hwy, CRAWLEY 6009, Australia +61 8 6488 2733 richard@csse.uwa.edu.au Rebecca Mancy Centre for Science Education, Faculty of Education The University of Glasgow Glasgow G3 6NH, Scotland +44 7967 730987 rebeccamancy@dcs.gla.ac.uk ABSTRACT Final year, group (capstone) projects in computing disciplines are often expected to fill multiple roles: in addition to allowing students to learn important domain-specific knowledge, they should reinforce computing and software engineering concepts and provide for the acquisition of transferable skills. For motivational and pedagogical reasons, it is clearly preferable that such projects respond to real needs, be those in research or industry. We describe two student projects based on a large repository of usage data and integrated into a course in Professional Computing. These projects fulfilled the objectives outlined above and were closely linked to the research of the first author. We suggest that similar projects based on large databases may offer a transferable paradigm for others to follow. Finally, we outline some important elements for a successful group project based on a large database. Categories and Subject Descriptors H3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Query formulation. H2.8 [Database Management]: Database Applications – Data mining, Scientific databases. K3.2 [Computer and Information Science Education]: Computer Science Education General Terms Measurement, Experimentation, Human Factors. Keywords GRUMPS, SQL, stream data, keystroke times, capstone course. 1. INTRODUCTION Capstone courses are typically compulsory, final year courses that provide a significant integrative, educational experience [1]. They may, for example, take the form of a large group software engineering project. They have long been recognised as important in the teaching of computer science and indeed many professional accreditation requirements state that such courses and projects must be undertaken. However, finding suitable projects can be difficult, as multiple requirements have to be satisfied. Project work should be a learning experience for students, both in terms of the actual knowledge gained and consolidated, but equally in that it should allow them to acquire generic skills such as teamwork and time management. Furthermore, projects should be seen to be useful, either to research or in a real-world, commercial situation; contrived assignments usually result in a lack of student motivation. On the technical side, data repositories are increasing in size and also in the range of applications. For example Terabyte databases are being accumulated for the Sloan Digital Sky Survey project in astronomy [4]. In human computer interaction relational databases, rather than flat files, are being used to store log data. We have found that the data thus generated is useful for group projects in a final year Professional Computing course. This paper describes such projects and the nature of the benefits to students, as well as the interplay between research and teaching that has given rise to this work. 2. LARGE DATASETS Recent advances in technology and the falling cost of data storage mean that it is becoming feasible to log and use large quantities (gigabytes) of recorded data. This is apparent in the research domain, for example in the Sloan Digital Sky Survey project in astronomy [4], but is similarly true in the commercial arena, where companies retain such information as full transaction details, with the aim of analysing customer purchase patterns [9]. As large databases become more common, it is important that universities offer students the opportunity to gain the understanding, skills and techniques necessary to work in these domains. More generally, use of large datasets allows students to appreciate the importance of optimisation, as system capacity becomes a limiting factor in execution time. It is therefore desirable that students work directly with datasets of this order of magnitude. 3. GRUMPS – A LARGE DATASET 3.1 Introduction to GRUMPS The Generic Remote Usage Measurement Production System (GRUMPS) [13] is being developed at Glasgow University [2]. The goal is to provide general purpose mechanisms for the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ITiCSE ’04, June 28-30, 2004, Leeds, UK. Copyright 2004 ACM 1-58113-000-0/00/0000…$5.00.