A Novel Single-Label Supervised Text Classification Approach based on Mining Association Rules Yanbo Wang, Frans Coenen, and Paul Leng Department of Computer Science, University of Liverpool Liverpool L69 7ZF, United Kingdom {jwang, frans, phl}@csc.liv.ac.uk Abstract In this paper, we introduce a novel single-label supervised text classification approach based on mining association rules, called Apriori-TFP-TC. We follow the common framework of text mining in general, separating text classification into two stages, (1) text pre- processing and (2) the utilization of a selected data mining classification technique. We describe two text pre-processing techniques to refine the inputted texts: one to identify relevant single-terms in the texts, and a second that identifies multi-term sets. The pre-processed data is then input to our association-rule-based classification rule mining algorithm, Apriori-TFP-C. We present results that show this approach can deliver good classification accuracy with computational efficiency. Keywords: Text Mining, Association Rules, Classification Rules. 1. Introduction Text Mining (TM) is a form of Knowledge Discovery in Databases (KDD) that focuses on discovering hidden patterns, rules, regularities and trends in non- database-like data, especially textual data. An important aspect of TM is Text Classification (TC), the automated categorization of texts into pre-defined classes. TC is the process of assigning a Boolean value to each pair (d j , c i ) ∈ COD × C, where COD is a collection of documents, C = {c 1 , …, c |C| } is a set of pre-defined classes, and (d j , c i ) is a document ∈ COD being labelled. TC has been well studied and researched as a popular topic in Machine Learning during the last decade [14]. Methods of TC can be divided into two significant groups: (1) single-label TC, which assigns exactly one class to each d j ∈ COD, and (2) multi-label TC, which assigns k classes to each d j ∈ COD. Most methods used involve supervised learning, in which a classifier is constructed from a pre-labelled set of training cases. A summary of approaches to supervised TC is given in [15]. As with other TM tasks, TC usually involves two stages: (1) Text Pre-processing