A Scalable Method for Preserving Oral Literature from Small Languages Steven Bird Dept of Computer Science and Software Engineering, University of Melbourne Linguistic Data Consortium, University of Pennsylvania Abstract. Can the speakers of small languages, which may be remote, unwritten, and endangered, be trained to create an archival record of their oral literature, with only limited external support? This paper de- scribes the model of “Basic Oral Language Documentation”, as adapted for use in remote village locations, far from digital archives but close to endangered languages and cultures. Speakers of a small Papuan language were trained and observed during a six week period. Linguistic perfor- mances were collected using digital voice recorders. Careful speech ver- sions of selected items, together with spontaneous oral translations into a language of wider communication, were also recorded and curated. A smaller selection was transcribed. This paper describes the method, and shows how it is able to address linguistic, technological and sociological obstacles, and how it can be used to collect a sizeable corpus. We con- clude that Basic Oral Language Documentation is a promising technique for expediting the task of preserving endangered linguistic heritage. 1 Introduction Preserving the world’s endangered linguistic heritage is a daunting task, far exceeding the capacity of existing programs that sponsor the typical 2-5 year “language documentation” projects. In recent years, digital voice recorders have reached a sufficient level of audio quality, storage capacity, and ease of use, to be used by local speakers who want to record their own languages. This paper investigates the possibility of putting the language preservation task into the hands of the speech community. With suitable training, they can be equipped to collect a variety oral discourse genres from a broad cross-section of the speech community, and then provide additional content to permit the recordings to be interpreted by others who do not speak the language, greatly enhancing the archival value of the materials. This paper describes a method for preserving oral discourse, originating in field recordings made by native speakers, and generating a variety of products including digitally archived collections. It addresses the problem of unwritten languages being omitted from various ongoing efforts to collect language re- sources for ever larger subsets of the world’s languages [1]. The starting point is Reiman’s approach [2], modified and refined so that it uses appropriate technol- ogy for Papua New Guinea, and so that it can scale up more easily. The method