Towards a Syllabus Repository for Computer Science Courses Manas Tungare, Xiaoyan Yu, William Cameron, GuoFang Teng, Manuel A. P´ erez-Qui ˜ nones, Lillian Cassel, Weiguo Fan, Edward A. Fox Virginia Tech and Villanova University {manas, xiaoyany, perez, wfan, fox}@vt.edu, {william.cameron, guofang.teng, lillian.cassel}@villanova.edu ABSTRACT A syllabus defines the contents of a course, as well as other in- formation such as resources and assignments. In this paper, we report on our work towards creating a syllabus repository of Com- puter Science courses across universities in the USA. We present some statistics from our initial collection of 8000+ syllabi. We show a syllabus creator that is integrated with Moodle [5], an open- source course management system, which allows for the creation of a syllabus for a particular course. Among other information, it includes knowledge units from the Computing Curricula 2001 body of knowledge. The goal of the syllabus repository is to pro- vide added value to the Computer Science Education community, and we present some such offerings. We conclude by presenting our future plans for the syllabus repository. These include using automated techniques to collect and classify syllabi, providing rec- ommendations to instructors when creating a syllabus, and allow- ing the community to share their syllabi automatically. The syl- labus collection will be part of the Computing and Information Technology Interactive Digital Educational Library (CITIDEL), a collection of the National Science Digital Library (NSDL). Categories and Subject Descriptors H.3.m [Information Storage and Retrieval]: Miscellaneous Keywords Syllabus, Curriculum, Computing Curricula 2001 1. INTRODUCTION A syllabus forms the backbone of a course offering: a complete syllabus typically includes the course number, title, a description, the learning objectives of the course, a list of the topics covered, links to reference material such as books or publications, and other related information. The various learning objects that are included in a course offering are created based on the syllabus definition, and are tightly integrated with the reference material (also included in the syllabus). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGCSE 2007 Covington, Kentucky, USA. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. Thus, knowledge of a course syllabus can be used to assess the structure of a course, the exact knowledge units covered, and the relative time devoted to each of them. The syllabus describes how individual learning objects are combined to form larger entities and packaged as a course for students. We are in the process of creating a repository of Computer Sci- ence syllabi. The first generation of this collection is composed of syllabi collected from the Web. We also have created tools that allow professors to create a syllabus and automatically publish it to our collection. We plan to use our collection to provide rec- ommendations to course creators (e.g., by suggesting a textbook to use for a particular course) and to provide other services to the Computer Science Education community. The collection of syllabi will be part of the Computing and Information Technology Inter- active Digital Educational Library (CITIDEL), a collection of the National Science Digital Library (NSDL). In this paper, we discuss our approach to collect the initial syllabi in our repository, tools to help create new course syllabi, tools to compare existing syllabi, and future plans for our collection. 2. CREATING THE COLLECTION In order to reap the larger advantages of a syllabus repository, a sizable collection of syllabus documents must first be available. However, populating an empty syllabus repository with enough syl- labi within a small frame of time would be a monumental task. Professors are more likely to continue publishing their syllabi over the Web as HTML or PDF documents instead of sharing them in a common repository. Including just a few syllabi in a collection would likely not provide enough return over the effort invested in sharing the syllabi. Thus, given the extensive availability of syllabi as published Web documents, we decided to gather them automati- cally. We used Google to locate documents that show a high likelihood of being a syllabus. This was accomplished by two specialized sets of queries submitted to Google’s search engine: the first, to locate departments of interest within an educational institution, and second, to locate syllabi within that department. Since our current interest is in accumulating syllabi in the field of Computer Science, we issued a query for “computer science site:edu” The result of this query was a list of pages within the computer science departments of many US university websites (i.e., those whose domain names end with ‘.edu’). We processed each re- sulting URL to obtain the relevant domain name, i.e., the homepage of each department. The resulting set of departments included 98 Computer Science departments (e.g., cs.vt.edu ). We then