213 EFFICIENT CONSTRAINT-BASED SEQUENTIAL PATTERN MINING USING DATASET FILTERING TECHNIQUES Tadeusz Morzy, Marek Wojciechowski, and Maciej Zakrzewicz Poznan University of Technology, Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland Abstract Basic formulation of the sequential pattern discovery problem assumes that the only constraint to be satisfied by discovered patterns is the minimum support threshold. However, very often users want to restrict the set of patterns to be discovered by adding extra constraints on the structure of patterns. Data mining systems should be able to exploit such constraints to speed-up the mining process. In this paper we discuss efficient constraint-based sequential pattern mining using dataset filtering techniques. We show how to transform a given data mining task into an equivalent one operating on a smaller dataset. We present an extension of the GSP algorithm using dataset filtering techniques and experimentally evaluate performance gains offered by the proposed method. Keywords: data mining, sequential patterns 1. Introduction Data mining aims at discovery of useful patterns from large databases or data warehouses. One of the data mining methods is sequential pattern discovery introduced in [2]. Informally, sequential patterns are the most frequently occurring subsequences in sequences of sets of items. Among many proposed sequential pattern mining algorithms, most of them are designed to discover all sequential patterns exceeding a user- specified minimum support threshold. Some of them (e.g. GSP [8]) also allow users to specify time constraints to be taken into account when checking whether a given data-sequence contains a given subsequence.