Classification of Textual Genres using Discourse Information Elnaz Davoodi, Leila Kosseim, Félix-Hervé Bachand, Majid Laali, and Emmanuel Argollo Department of Computer Science & Software Engineering Concordia University Montreal, Canada e_davoo@encs.concordia.ca, leila.kosseim@concordia.ca, felixherve@gmail.com, m_laali@encs.concordia.ca, emmanuel.argollo@gmail.com Abstract. This papers aims to measure the influence of textual genre on the us- age of discourse relations and discourse markers. Specifically, we wish to evaluate to what extend the use of certain discourse relations and discourse markers are correlated to textual genre and consequently can be used to predict textual genre. To do so, we have used the British National Corpus and compared a variety of discourse-level features on the task of genre classification. The results show that individually, discourse relations and discourse markers do not outperform the standard bag-of-words approach even with an identical num- ber of features. However, discourse features do provide a significant increase in performance when they are used to augment the bag-of-words approach. Using discourse relations and discourse markers allowed us to increase the F-measure of the bag-of-words approach from 0.796 to 0.878. 1 Introduction Well-written texts are composed of textual units that are connected to each other via discourse relations. Such relations (e.g. CAUSE, CONDITION) communicate an infer- ence intended by the writer and allow the creation of coherent connections between textual units. Discourse relations can be made explicit through discourse markers such as but, since, because, etc. or can be left implicit, when no explicit cue phrase is used to indicate the relation. Previous work such as [26, 1, 18, 4] has shown a correlation between the use of discourse relations and certain textual dimensions, such as genre, level of formality and level of readability. For example, [26] has shown that the distribution of discourse relations in the PDTB corpus [19] is influenced by the textual genre; that is, texts from different genres tend to contain more of discourse relations than others. The goal of this paper is to provide more insight on these preliminary investiga- tions and measure the influence of textual genre on the usage of discourse relations and discourse markers on a larger scale. Specifically, we wish to evaluate to what extend the use of discourse relations and discourse markers are correlated to textual genre and consequently can be used to predict textual genre. To do so, we have used the British