Evaluating term concept association measures for short text expansion: two case studies of classiﬁcation and clustering Alessandro Marco Boutari, Claudio Carpineto, and Raﬀaele Nicolussi Fondazione Ugo Bordoni, Rome, Italy {aboutari, carpinet, rnicolussi}@fub.it Abstract. The proliferation of Web applications based on short texts represents both an opportunity and a challenge to text mining algo- rithms, because of sparse representations and lack of shared context. To address this problem, we investigate a term expansion approach based on analyzing the relationships between the term concepts present in the concept lattice associated with the document corpus. We deﬁne ﬁve term concept association measures: proximity, concept similarity, connection strength, damping-weighted proximity, proximity&strength. By means of two case studies, we evaluate the eﬀectiveness of these measures for expansion-enhanced K-NN classiﬁcation and K-Means clustering of short texts. The results suggest that the ﬁve measures are highly competitive, with the best measure showing a clear improvement over the correspond- ing unenhanced K-NN and K-Means algorithms, as well as over two al- ternative term expansion enhancements (i.e., based on Wordnet and on pseudo-relevance feedback). 1 Introduction The increasingly important role played by short texts in the modern means of Web communication and publishing, such as Twitter messages, blogs, news feeds, and customer reviews, opens new application avenues for text mining techniques but it also raises new scientiﬁc challenges. Although text classiﬁcation and clus- tering are well established techniques (e.g., [18], [13]), they are not successful in dealing with short and sparse data, because standard text similarity measures require substantial word co-occurrence or shared context. There are two main approaches to address the problems raised by short texts. Either we try to deﬁne new semantic similarity functions by means of external knowledge sources, without changing the underlying document representation (e.g., [15], [2], [16]), or we expand the given texts prior to using the traditional syntactic document similarity functions (e.g., [11], [10], [1]). Our work belongs to the latter research line. We investigate a method for text expansion that exploits the features of the concept lattice built from the document-term matrix. We model the similarity between two terms as function of the relationships between the corresponding