International Journal of Computational Linguistics and Applications vol. 7, no. 2, 2016, pp. 129–143 Received 05/02/2016, accepted 07/03/2016, final 16/09/2016 ISSN 0976-0962, http://ijcla.bahripublications.com Short Text Classification on Complaint Documents SHIRLEY ANUGRAH HAYATI ,ALFAN FARIZKI WICAKSONO, AND MIRNA ADRIANI Universitas Indonesia, Indonesia ABSTRACT Indonesian government has developed a system for citizens to voice their aspirations and complaints, which are then stored in the form of short documents. Unfortunately, the existing system employs human annotators to manually categorize the short doc- uments, which is very expensive and time-consuming. As a result, automatically classifying the short documents into their correct topics will reduce manual works and obviously increase the ef- ficiency of the task itself. In this paper, we propose several ap- proaches to automatically classify these short documents using various features, such as unigrams, bigrams, and their combina- tion. Moreover, we also demonstrate the use of information gain and Latent Dirichlet Allocation (LDA) for selecting discrimina- tive features. 1 I NTRODUCTION Short Message Service (SMS) and the Internet have become impor- tant and powerful communication media for people. Some coun- tries take advantage of this advancement in information technology to develop website as a medium for their citizens to give feedback or report problems related to government policies. Public feedback This is a pre-print version of the paper, before proper formatting and copyediting by the editorial staff.