The Corpus of Spoken Icelandic and Its Morphosyntactic Annotation Eiríkur Rögnvaldsson University of Iceland eirikur@hi.is Abstract We describe the Corpus of Spoken Icelandic (ÍS-TAL) which is made up of 15 hours of spontaneous naturally occurring conversa- tions, 31 conversations in all. The corpus comprises 184,080 tokens, 14,297 types and 9,221 lemmas. It has been transcribed using standard orthography. We present a list of the 30 most common lemmas in the corpus and compare it to a list of the most frequent lemmas in the written language, concluding that the differences between the two lists are smaller than expected. We have tagged the corpus morphologically with a statistical tagger that had been trained on written texts. The results are much better than we expected, and the tagging accuracy is as least as high as for the written texts. The final part of the paper is a report on a work in progress. We have been experimenting with converting the morphological tagging into a shallow syntactic markup by applying a few simple hand-written rules. Even though the analysis we get by using this procedure is bound to be incomplete and contain several errors, we conclude that the results are promising and we can use this method to build a simple yet useful treebank with minimal effort.