Chapter 17 The Genetics of Language and Farming Spread in India Toomas Kivisild, Siiri Rootsi, Mait Metspalu, Ene Metspalu, Juri Parik, Katrin Kaldma, Esien Usanga, Sarabjit Mastana, Surinder S. Papiha & Richard Villems Most maternal lineages of present-day Indians de- rive from a common ancestor in mtDNA haplogroup M that split into Indian, eastern Asian, Papuan, and Australian subsets 40,000 - 60,000 mtDNA-years ago. The second major component in Indian maternal heredity lines traces back to the split of haplogroup U into Indian, western Eurasian and northern Afri- can variants approximately at the same time. The variation in these two ancient Indian-specific sets of lineages is the main modifier in the heterogeneity landscape of Indian populations, defining the ge- netic differences between caste groups and geo- graphic regions in the sub-continent. The difference between regional caste groups is accentuated fur- thermore by the presence of a northwest to south decline of a minor package of lineages of western Asian or European origin. In contrast, the majority of Indian paternal line- ages do not share recent ancestors with eastern Asian populations but stem from haplogroups common to (eastern) European or western Asian populations. This finding has recently been interpreted in favour of the classical Indo-Aryan invasion hypothesis. Here, we show that this interpretation is probably caused by a phylogeographically-limited view of the Indian Y-chromosome pool, amplified because of current inconsistencies in the interpretation of the temporal scale of the variability in the non-recombining part of the Y chromosome (NRY).It appears to us that the high variability of STRs in the background of NRY variants in India is consistent with the view of largely autochthonous pre-Holocene genetic diversification - a conclusion reached earlier for the Indian mater- nal lineages (Kivisild et al. 1999a). While interpreting the genetic aspects of farm- ing/language dispersal in the Indian context, it is easy to get lost in its 'multitude of endogamous pock- ets' (Cavalli-Sforza et al. 1994). Yet a forest can hope- fully be seen behind the trees, provided that the conclusions to be drawn derive from a phylogeo- graphically representative analysis of the people of the sub-continent.Perhaps new ideas, analogous to the recently introduced 'SPIWA' model for Europe (see Renfrew this volume), are needed when developing new farmingl/anguage dispersal models for India. The earliest 'agricultural package' in the Indian subcontinent - a combined presence of wheat, bar- ley, cattle, sheep and goat domestication - is found in Mehrgarh, Baluchistan, and dates to about 9000 years before present (BP). It spread first into an area extending from the Punjab in the northwest to Uttar Pradesh in the east and to Gujarat in the south. It took another 4000 years before it eventually reached southern Peninsular India (Chakrabarti 1999). In this northwestern early agricultural region lie the roots of the Indus Civilization, and any later cultural in- fluence or human migration from the northwest or west had to pass through this area in order to reach the rest of India. Neolithic communities in India did not start on empty ground. Cultural complexes belonging to a comparatively short Mesolithic episode developed from the preceding Middle and Upper Palaeolithic cultures and continued to exist through the Neolithic, Bronze and Iron Ages, with microlithic tools con- tinuing in use here and there in some communities even today. The advent of agriculture in India, al- though largely reflecting local developments, is to be understood against the background of agricul- tural growth in its geographic neighbourhood, en- compassing the Iranian plains and the Fertile Crescent in the west, and Southeast Asia - as far as rice is concerned - in the east (Chakrabarti 1999). Three quarters of the Indian population today speak Indo-European (IE) languages. Next, in terms of the number of speakers, is the Dravidian lan-