Information Sciences 478 (2019) 524–539 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Mining conditional discriminative sequential patterns Zengyou He a,b, , Simeng Zhang a , Feiyang Gu c , Jun Wu d a School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China b Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Tuqiang Road 321, Dalian 116620, China c Baidu Inc., Beijing, China d School of Information Engineering, Zunyi Normal University, Zunyi, China a r t i c l e i n f o Article history: Received 26 May 2018 Revised 5 October 2018 Accepted 17 November 2018 Available online 19 November 2018 Keywords: Sequential pattern Discriminative pattern Pattern mining Pattern-based classification a b s t r a c t Discriminative sequential pattern mining is one of the most important topics in pattern mining, which has a very wide range of applications. Discriminative sequential pattern mining is intended to extract sequential patterns with significant differences among dif- ferent classes. In recent years, a variety of algorithms for mining discriminative sequential patterns have been proposed, but these algorithms still suffer from generating many re- dundant patterns. There are many factors that may lead to the redundancy of reported patterns, among which the subset-induced redundancy is the most critical one, i.e., some patterns are reported to be discriminative mainly because some of their sub-patterns are strongly discriminative. In order to solve the subset-induced redundancy issue, we propose the concept of conditional discriminative sequential pattern, and design a new algorithm called CDSPM (Conditional Discriminative Sequential Pattern Mining) for extracting such kinds of patterns. The experimental results on real data sets show that CDSPM can effec- tively remove discriminative sequential patterns that are redundant with respect to their sub-patterns. © 2018 Elsevier Inc. All rights reserved. 1. Introduction Sequential data are composed of a set of sequences, where each sequence is an ordered list of discrete events or ele- ments. Examples of sequential data include biological sequences, web-usage data, speech signals, human activities [26,27], multi-neuronal spike trains [30], etc. Since Agrawal and Srikant first put forward the concept of frequent sequential pat- tern [1], many efficient algorithms have been proposed for mining such patterns from sequential data (e.g. GSP [31], SPAN [2], LAPIN [36], FreeSpan [18], PrefixSpan [19]). Meanwhile, the problem of frequent sequential pattern mining has been extended to different scenarios (e.g. [6,15,33]). In many practical applications [3,9,16,17,25], it is necessary to conduct pattern discovery from class-labeled sequential data. These practical applications motivate the study of discriminative sequential pattern mining problem, whose objective is to find sequential patterns that are over-expressed in the target class. To tackle this data analysis issue, there are already several algorithms available in the literature [5,12,21,34,37,38]. Despite the success on improving the running efficiency of discriminative sequential pattern mining, the existing algo- rithms still suffer from generating many redundant patterns. The redundancy issue can be attributed to many factors. One of Corresponding author. E-mail addresses: zyhe@dlut.edu.cn (Z. He), zhangsimeng1228@163.com (S. Zhang), flysea.gu@gmail.com (F. Gu), wujun.myway@gmail.com (J. Wu). https://doi.org/10.1016/j.ins.2018.11.043 0020-0255/© 2018 Elsevier Inc. All rights reserved.