256 Int. J. Intelligent Systems Technologies and Applications, Vol. 14, Nos. 3/4, 2015 Copyright © 2015 Inderscience Enterprises Ltd. A new term weighting scheme for text categorisation Fatiha Barigou Laboratory of Computer Science of Oran, Department of Computer Science, University of Oran 1, Ahmed Ben Bella, Oran 31000, Algeria Email: fatbarigou@gmail.com Abstract: Recently, the study of term weighting schemes has increasingly attracted the attention of researchers in the field of text categorisation (TC). Unlike information retrieval, TC is a supervised learning task that makes use of the prior information about the distribution of training documents in different predefined categories. This information, being omitted from traditional weighting schemes, is considered very useful and has been widely used for the term selection and building classifiers. This paper aims to study and analyse a new weighting measure to improve performance of a k nearest neighbours (kNN)-based TC. Keywords: text categorisation; term weighting; supervised term weighting scheme; kNN; k nearest neighbours. Reference to this paper should be made as follows: Barigou, F. (2015) ‘A new term weighting scheme for text categorisation’, Int. J. Intelligent Systems Technologies and Applications, Vol. 14, Nos. 3/4, pp.256–272. Biographical notes: Fatiha Barigou graduated from the Department of Computer Science, University of Oran 1, Algeria. In 2012, she received her PhD in Computer Science from the University of Oran 1. She is currently a Research Member of Laboratory of Computer Science of Oran. Her research interests include natural language processing, information extraction, information retrieval, knowledge-based system, pattern recognition and data mining. 1 Introduction Nowadays, the electronic information is abundantly available. The World Wide Web, for example, is continually enriched with new contents: companies are more and more storing data, email is becoming an extremely popular form of communication and old manuscripts are now available in digital forms. All this complex information would be meaningless if our ability to effectively access did not increase, too. For this, we need tools to organise and access this data. One successful solution that tries to answer this problem is the automatic text categorisation (TC). The task of TC consists in assigning new documents to predefined categories, on the basis of knowledge gained during the training phase where a classification system is built using a set of labelled training examples and a learning algorithm. According to Sebastiani (2002), building an automated TC system is based on three main