DISAMBIGUATING TURKISH WITH NOOJ ÜMİT MERSİNLİ Abstract This paper presents observed frequencies of ambiguities in Turkish National Corpus after an introduction mentioning previous work on disambiguating Turkish. Rule-based disambiguation processes using NooJ components are also presented with sample grammars. Introduction This paper presents the observed frequencies of ambiguities in Turkish National Corpus Project (TNC) and the current state of rule-based disambiguation processes using NooJ components. Upon literature review, the first section provides the lists of most common ambiguities. The second chapter demonstrates the disambiguation practices using the components of NooJ. In the conclusion, suggestions and future directions are presented. Documenting observed frequencies of homographic word forms or affixes or their combinations presented in the section below serves as a starting point for the ongoing rule-based disambiguation phases of Turkish National Corpus Project (TNC). Previous work on rule-based disambiguation in Turkish mainly focused on constraints for pre-defined ambiguities, in the 90s following the two-level morphological specifications of Turkish. Kuruöz (1994) presents a tagger and disambiguator, tested on a 7004- words sample text having 78% of the parses with 2 or more tags and thus ambiguous. Tür (1996) reports that, of 60,873 words, 49,5% of the parses are ambiguous before applying the hand-crafted contextual rules. Although any generalization on the ambiguity potential of a language is problematic, since it varies according to the approach used, rules defined, number of tags, size of the corpus or lexicon etc., we can conclude that, as Yüret & Türe (2006) states, “close to half of the words in running text are morphologically ambiguous in Turkish”.