Evaluating term concept association measures for short text expansion: two case studies of classification and clustering Alessandro Marco Boutari, Claudio Carpineto, and Raffaele Nicolussi Fondazione Ugo Bordoni, Rome, Italy {aboutari, carpinet, rnicolussi}@fub.it Abstract. The proliferation of Web applications based on short texts represents both an opportunity and a challenge to text mining algo- rithms, because of sparse representations and lack of shared context. To address this problem, we investigate a term expansion approach based on analyzing the relationships between the term concepts present in the concept lattice associated with the document corpus. We define five term concept association measures: proximity, concept similarity, connection strength, damping-weighted proximity, proximity&strength. By means of two case studies, we evaluate the effectiveness of these measures for expansion-enhanced K-NN classification and K-Means clustering of short texts. The results suggest that the five measures are highly competitive, with the best measure showing a clear improvement over the correspond- ing unenhanced K-NN and K-Means algorithms, as well as over two al- ternative term expansion enhancements (i.e., based on Wordnet and on pseudo-relevance feedback). 1 Introduction The increasingly important role played by short texts in the modern means of Web communication and publishing, such as Twitter messages, blogs, news feeds, and customer reviews, opens new application avenues for text mining techniques but it also raises new scientific challenges. Although text classification and clus- tering are well established techniques (e.g., [18], [13]), they are not successful in dealing with short and sparse data, because standard text similarity measures require substantial word co-occurrence or shared context. There are two main approaches to address the problems raised by short texts. Either we try to define new semantic similarity functions by means of external knowledge sources, without changing the underlying document representation (e.g., [15], [2], [16]), or we expand the given texts prior to using the traditional syntactic document similarity functions (e.g., [11], [10], [1]). Our work belongs to the latter research line. We investigate a method for text expansion that exploits the features of the concept lattice built from the document-term matrix. We model the similarity between two terms as function of the relationships between the corresponding