Information Sciences 478 (2019) 524–539
Contents lists available at ScienceDirect
Information Sciences
journal homepage: www.elsevier.com/locate/ins
Mining conditional discriminative sequential patterns
Zengyou He
a,b,∗
, Simeng Zhang
a
, Feiyang Gu
c
, Jun Wu
d
a
School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China
b
Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Tuqiang Road 321, Dalian 116620, China
c
Baidu Inc., Beijing, China
d
School of Information Engineering, Zunyi Normal University, Zunyi, China
a r t i c l e i n f o
Article history:
Received 26 May 2018
Revised 5 October 2018
Accepted 17 November 2018
Available online 19 November 2018
Keywords:
Sequential pattern
Discriminative pattern
Pattern mining
Pattern-based classification
a b s t r a c t
Discriminative sequential pattern mining is one of the most important topics in pattern
mining, which has a very wide range of applications. Discriminative sequential pattern
mining is intended to extract sequential patterns with significant differences among dif-
ferent classes. In recent years, a variety of algorithms for mining discriminative sequential
patterns have been proposed, but these algorithms still suffer from generating many re-
dundant patterns. There are many factors that may lead to the redundancy of reported
patterns, among which the subset-induced redundancy is the most critical one, i.e., some
patterns are reported to be discriminative mainly because some of their sub-patterns are
strongly discriminative. In order to solve the subset-induced redundancy issue, we propose
the concept of conditional discriminative sequential pattern, and design a new algorithm
called CDSPM (Conditional Discriminative Sequential Pattern Mining) for extracting such
kinds of patterns. The experimental results on real data sets show that CDSPM can effec-
tively remove discriminative sequential patterns that are redundant with respect to their
sub-patterns.
© 2018 Elsevier Inc. All rights reserved.
1. Introduction
Sequential data are composed of a set of sequences, where each sequence is an ordered list of discrete events or ele-
ments. Examples of sequential data include biological sequences, web-usage data, speech signals, human activities [26,27],
multi-neuronal spike trains [30], etc. Since Agrawal and Srikant first put forward the concept of frequent sequential pat-
tern [1], many efficient algorithms have been proposed for mining such patterns from sequential data (e.g. GSP [31], SPAN
[2], LAPIN [36], FreeSpan [18], PrefixSpan [19]). Meanwhile, the problem of frequent sequential pattern mining has been
extended to different scenarios (e.g. [6,15,33]).
In many practical applications [3,9,16,17,25], it is necessary to conduct pattern discovery from class-labeled sequential
data. These practical applications motivate the study of discriminative sequential pattern mining problem, whose objective
is to find sequential patterns that are over-expressed in the target class. To tackle this data analysis issue, there are already
several algorithms available in the literature [5,12,21,34,37,38].
Despite the success on improving the running efficiency of discriminative sequential pattern mining, the existing algo-
rithms still suffer from generating many redundant patterns. The redundancy issue can be attributed to many factors. One of
∗
Corresponding author.
E-mail addresses: zyhe@dlut.edu.cn (Z. He), zhangsimeng1228@163.com (S. Zhang), flysea.gu@gmail.com (F. Gu), wujun.myway@gmail.com (J. Wu).
https://doi.org/10.1016/j.ins.2018.11.043
0020-0255/© 2018 Elsevier Inc. All rights reserved.