TESTING THE PROJECTIVITY HYPOTHESIS Vladimir Pericliev Mathematical Linguistics Dpt Institute of Mathematics with Cemp Centre lll3 Sofia, bl.8, BULGARIA Ilarion Ilarionov Mathematics Dpt Higher Inst of Eng & Building Sofia, BULGARIA ABSTRACT The empirical validity of the projeetivity hypothesis for Bulgarian is tested. It is shown that the justi- fication of the hypothesis presented for other lan- guages suffers serious methodological deficiencies. Our automated testing, designed to evade such defi- ciencies~ yielded results falsifying the hypothesis for Bulgarian: the non-projective constructions stu- died were in fact grammatical rather than ungrammati- cal, as implied by the projeetivity thesis. Despite this, the projectivity/non-projectivity distinction itself has to be retained in Bulgarian syntax and, with some provisions, in the systems for automatic processing as well. 1 THE PROJECTIVIrY HYPOTHESIS Projectivity is word order constraint in depen- dency grammars, which is analogous to continuous con- stituency within phrase-structure systems. In a pro- jective sentence, between two words connected by a dependency arc only such words can be positioned which are governed (directly or indirectly) by one of these words. Or, in other words, a sentence is pro- jective in case there are no intersections between arcs and projections in its dependency tree diagram. Thus, for instance, sentence (i) is projective, whe- reas sentence (2) is non-projective: He took the book He the took book We might note that sentence (2) is ungrammatical. The projectivity hypothesis, originally propounded by Lecerf (of. e.g. Lecerf 1960) and later gaining wide acceptence, amounts to the following: Natural languages are projective in the sense that the non- projective constructions in them are ungrammatical. And this has an important consequence. Thus, taking into account the self-evident fact that ungrammatical phrases do not occur in texts, in the processing of texts we can rule out from consideration the non-pro- jective parses on the basis of ungrammatioslity. Pro- jectivity thus serves as a filtering device, shown further to be of extremely powerful nature (op.oit.). To estimate the usefulness of the projectivity hy- pothesis for each particular language requires the conduct of extensive empirical testings. On the basis of statistical accounts from inspection of texts French was reported by Leoerf to be almost lO0."~pro- jective. The same would be true, according to him, for other languages like German, Italian, Dutch etc., although the material available Cat the time) was not sufficient for statistical processing. English is al- so believed to be a projective language: in 30 000 phrases only two non-projective ones were found (Har- per and Hays 1959); in Kareva (1965) somewhat diffe- rent, but still result in the same vein was obtained (using different notation): from lO 000 phrases of connected text 620 were found to be non-projective. Such investigations can be seen to be bound toge- ther by their a r ~ h to the testing of the pro- jectivity hypothesis: texts are explored and statis- tical accounts are made of the correlation between projective and non-projective phrases. The very rare occurrence in such texts of non-projective sentences is interpreted as a confirming evidence. Such studies represent what we shall furtheron refer to as "the textual approach to the testing of the projeotivity hypothesis" (or simply, "the textual approach"). 2 DEFICIENCIES OF THE TEXTUAL APPROACH The textual approach, in addition to the fact that it involves the tedious task of inspection of thousa- nds of sentences, suffers serious methodological shortcomingswhich can be summarized as follows: (i) Irrelevancz of data. The data the textual app- roach presents in justification of the hypothesis is, strictly speaking, irrelevant. Knowing that non-pro- jective phrases do not occur in texts, naturally, gives us no formal right to infer that such phrases are ungrammatical as well. (ii) I~is_.u~fi_c~en_.c~ of data. The data provided by this approach is insufficient to justify even a __wea- ker claim to the effect that non-projective structur- es do not occur in texts. To justify this latter claim further steps in addition to direct inspection of certain (immaterially how large) corpora of texts should be made. In particular, a justifiable justifi- cation would have to involve both further factual confirmation (e.g. demonstration that predictions from the hypothesis in fact comply with actual data) and "systematic" confirmation (demonstration that the hypothesis is consistent with other linguistic prin- ciples, facts, etc.) (of. e.g. Baths 19Bl: Ch.9; al- so § 3 below). (iii) Heuristic futility. The textual approach is heuristically futile in the sense that, being confi- ned to a mere registration of non-projective constru- ctions within specific texts, we have no way of know- ing whether the structures encountered (if some are at all encountered) are all the non-projective struc- tures in a given language, and if not, how many more are there, and which exactly they are. 3 TESTING THE PROOECTIVITY HYPOTHESIS FOR BULGARIAN The considerations given in § 2 seriously under- mine the credulousness of the results obtained for other languages following the textual approach. What was important for our investigation however was to evade these methodological deficiencies in the study of Bulgarian. Accordingly, we had to address not texts, but rather what we had to do was to generate all logically admissible non-projective structures 56