Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint Dang Nguyen 1 , Wei Luo 1 , Tu Dinh Nguyen 1 , Svetha Venkatesh 1 , Dinh Phung 2 1 Deakin University, Geelong, Australia, Center for Pattern Recognition and Data Analytics, School of Information Technology {d.nguyen, wei.luo, tu.nguyen, svetha.venkatesh}@deakin.edu.au 2 School of Information Technology, Monash University, Clayton Campus, VIC 3800, Australia dinh.phung@monash.edu Abstract. When learning sequence representations, traditional pattern- based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequen- tial datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the supe- rior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization. 1 Introduction Many real-world applications such as web mining, text mining, bio-informatics, system diagnosis, and action recognition have to deal with sequential data. The core task of such applications is to apply machine learning methods, for example, K-means or Support Vector Machine (SVM) to sequential data to find insightful patterns or build effective predictive models. However, this task is challenging since machine learning methods typically require inputs as fixed-length vectors, which are not applicable to sequences. A well-known solution in data mining is to use sequential patterns (SPs) as features [6]. This approach first mines SPs from the dataset, and then represents each sequence in the dataset as a feature vector with binary components indicat- ing whether this sequence contains a particular sequential pattern. We can see the dimension of the feature space is huge since the number of SPs is often large. Consequently, this leads to the high-dimensionality and data sparsity problems. To reduce the dimension of the feature space, many researchers have tried to extract only interesting SPs under an unsupervised setting [17,9,5] or discrimi- native SPs under a supervised setting [6,19,3]. The methods discover interesting