Bandhu Collection Project: Working Paper 2 Aspect markers as an indicator of narrative style in Nepali Andrew Hardie and Ram Lohani Department of Linguistics and English Language Lancaster University Introduction This project’s first working paper outlined how, supported by a grant from the British Academy, 1 the Bandhu Collection of early-1970s Nepali spoken text was re-digitised in a number of forms (original orthographic form, transliterated Devanagari form, and part-of-speech tagged form). In this working paper, we exemplify one of the types of research for which the Collection may be employed. In light of evidence from analysis of the Nepali National Corpus, which will be outlined below, there is some reason to believe that narrativity (or narrative content) is a parameter impacting on grammatical variation across Nepali text types – although, as noted below, this parameter may subsume a number of other dimensions of variation that have been identified in studies on English text-types. The nature of the Bandhu Collection as a corpus consisting largely of prototypical oral narratives (anecdotes, folk stories, and so on) makes it a highly useful point of contrast to the different written genres of the NNC to throw into relief the grammatical features of narrativity. In this working paper, we will concentrate particularly on the frequency of markers of progressive aspect. However, we will also look briefly at part-of-speech frequency as a more general grammatical parameter of variation. Part-of-speech frequencies across genres in the Core Sample of the Nepali National Corpus The NNC has been part-of-speech (POS) tagged using the Nelralec Tagset. 2 This is a group of around 100 word-level morphosyntactic categories. Using this annotated data, it is possible to identify which categories are characteristic of any given division of the corpus in comparison to the rest of the corpus. This is done using a keyness analysis based on statistical significance. Any POS tag whose frequency is (relatively) greater in a given subcorpus than in the remainder of the corpus, to a degree that is highly statistically significant, 3 is considered a key tag for that subcorpus. The use of POS category frequency as a parameter of grammatical variation in texts is well- 1 Grant reference: British Academy Small Research Grant SG-42148. 2 See Hardie et al. (2005, forthcoming) 3 With a p-value less than 0.01. The log likelihood statistic was used (see Dunning 1993).