Privacy-Preserving Sequential Pattern Release Huidong Jin 1,⋆ , Jie Chen 1 , Hongxing He 1 , and Christine M. O’Keefe 2 1 CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra ACT 2601, Australia Huidong.Jin@nicta.com.au, Jiechen@ieee.org, Hongxing.He@csiro.au 2 CSIRO Preventative Health National Research Flagship, Canberra ACT 2601, Australia Christine.OKeefe@csiro.au Abstract. We investigate situations where releasing frequent sequential patterns can compromise individual’s privacy. We propose two concrete objectives for privacy protection: k-anonymity and α-dissociation. The first addresses the problem of inferring patterns with very low support, say, in [1,k). These inferred patterns can become quasi-identifiers in link- ing attacks. We show that, for all but one definition of support, it is impossible to reliably infer support values for patterns with two or more negative items (items which do not occur in a pattern) solely based on frequent sequential patterns. For the remaining definition, we formulate privacy inference channels. α-dissociation handles the problem of high certainty of inferring sensitive attribute values. In order to remove pri- vacy threats w.r.t. the two objectives, we show that we only need to examine pairs of sequential patterns with length difference of 1. We then establish a Privacy Inference Channels Sanitisation (PICS) algorithm. It can, as illustrated by experiments, reduce the privacy disclosure risk car- ried by frequent sequential patterns with a small computation overhead. 1 Introduction Data mining poses the dilemma of discovering useful knowledge from databases while avoiding privacy disclosure. There have been various research efforts on privacy-preserving data mining [1,2] from different perspectives such as identifi- cation [3], secure computation [1] and sensitive rules [4]. However, little work has been concentrated on removing privacy threats carried by data mining results [5], e.g., sequential patterns. In this work we study how released sequential patterns represent threats to privacy. We will cover sensitive attribute values disclosure and identification disclosure that focuses on the anonymity of individuals. Our research motivation is from the healthcare domain where protecting the patients’ privacy, such as anonymity and health status, is crucial. In Australia, Huidong Jin is currently with National ICT Australia, Canberra Lab, Australia. National ICT Australia is funded by the Australian Governments Department of Communications, Information Technology, and the Arts and the Australian Re- search Council through Backing Australias Ability and the ICT Research Centre of Excellence programs. The authors thank D. Lovell, W. M¨ uller, D. McAullay and anonymous reviewers for their comments and suggestions. Z.-H. Zhou, H. Li, and Q. Yang (Eds.): PAKDD 2007, LNAI 4426, pp. 547–554, 2007. c Springer-Verlag Berlin Heidelberg 2007