LEXICOGRAMMATICAL PATTERNS OF LITHUANIAN PHRASES R ta Marcinkevi ien , Gintar Grigonyt Vytautas Magnus University, Kaunas, Lithuania Abstract The paper overviews the process of compilation of the first corpus-based Dictionary of Lithuanian Phrases. Phrases are transformed from collocational strings which were extracted from the corpus of contemporary Lithuanian language of 100 million running words applying a new statistical method called Gravity counts. The paper presents theoretical approach towards the most relevant notions of collocation, collocational string, phrase, part of speech, grammatical and lexicogrammatical pattern. Statistical method of extraction of collocational strings is shortly presented together with the initial output of raw collocational strings. Types of transformations of collocational strings into phrases and other manual procedures are described in a nutshell while primary results of patterning of the Lithuanian phrases as well as future steps are presented in greater detail. Keywords: collocation, collocational string, Gravity counts, fragment of text, POS pattern, grammatical pattern 1. Introduction The compilation of the Dictionary of Lithuanian Phrases includes three main phases: extraction of collocational strings from the corpus of present day Lithuanian language, transformation of collocational strings into phrases, and patterning of all the phrases. Each phase is based on certain theoretical approaches as well as notions of collocation, collocational string, phrase, and pattern, presented here. Collocation is a fuzzy term embracing a great variety of notions. The definition of a collocation differs according to researcher’s standpoint and the method of extraction. There are two different perspectives on the notion of collocation from the point of view of its form and structure. One group of authors (J.Firth, J.Sinclair, M.Stubbs, among others) prefers contextual or statistical definition of collocation. It could be generelised as follows: one item collocates with another that appears somewhere near it in a given text. The assumption underlying collocation is based on its structure: collocation consists of a node word and its collocates, so the search of a collocation starts with the node word. Thus statistical definition highlights lexical relationship between two or more items that tend to co-occur. However, it does not allow one to detect multi-word collocations as they appear in the texts and to define their boundaries. Statistical collocations are usually presented as lemmas for node words and their collocates.