HANS J. HOLM 1 This is a preprint of an article whose final and definitive form will be published in the Journal of Quantative Linguistics 14-2 of 2007.[copyright Taylor & Francis]; Journal of Quantative Linguistics is available online at: http://journalsonline.tandf.co.uk/ The new Arboretum of Indo- European “Trees” Can new Algorithms reveal the Phylogeny and even Prehistory of IE 1 ? Hans J. Holm Hannover, Germany Abstract Specialization in linguistics vs. biological informatics leads to widespread misunderstandings and false results caused by poor knowledge of the essential conditions of the respective methods and data applied. These are analyzed and the insights used to assess the recent glut of attempts to employ methods from biological informatics in establishing new phylogenies of Indo- European languages. INTRODUCTION 2 In the last ten years, the easy availability of phylogeny reconstruction packages has led to a sheer arboretum of newly developed “trees” of Indo-European. Assessments range from total disapproval by most traditional historical linguists to enthusiastic trashy circulation by magazines and journals. We may at least note with pleasure that they demonstrate a strong public interest in the Indo-European Urheimat question. The authors are proud to distinguish the main languages, what is no progress at all, since these results have been obtained by even the oldest methods (cf. Holm 2005 [3.1.1]). However, in the higher levels most ‘trees’ - often only ‘binary topologies’ - differ from each other, as well as from traditional views 3 , or show only insignifi- cant brushlike - branchings. The reader is left with these differences unexplained, and parallel work is seldom analyzed. Thus, these new results are not beneficial. Where are the reasons for these differences? All studies up to now preferred a ‘trial and error’ approach. However, it was too often difficult to distinguish whether the differences (or errors?) are due to the data or the methods, or both 4 . In this study therefore, we will analyze the data and methods applied, following other scien- tific reasoning: For the main “problem two” – subgrouping - we shall analyze the different methodological approaches and check whether the applied methods are appropriate for the subgrouping of languages. This final aim requires before, A look at the ‘final’ test options adduced in the two fields. According to this line of reasoning, we analyze the functional conditions and assumptions for which the ad- duced algorithms were designed, in particular, whether these are given in linguistics; As a basis we need to look at traditional methods of subgrouping in historical linguistics; First, let us start with the often involved easier (?) “problem one” – glottochronology: 1 Indo-European 2 I owe thanks for helpful comments and corrections from many sides, most of all Sheila Embleton, Joe Felsenstein, and Johann Wä- gele. Of course, all remaining mistakes are my own responsibility. 3 E.g. presence vs. absence of the Indo-Iranian or Balto-Slavic group. 4 cf. also Nakhleh et al. (2005).