Data Min Knowl Disc (2007) 15:107–144 DOI 10.1007/s10618-007-0064-z Experiencing SAX: a novel symbolic representation of time series Jessica Lin · Eamonn Keogh · Li Wei · Stefano Lonardi Received: 15 June 2006 / Accepted: 10 January 2007 / Published online: 3 April 2007 Springer Science+Business Media, LLC 2007 Abstract Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic rep- resentations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic rep- resentations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic represen- tation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little corre- lation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, Responsible editor: Johannes Gehrke. J. Lin (B) Information and Software Engineering Department, George Mason University, Fairfax, VA 22030, USA e-mail: jessica@ise.gmu.edu E. Keogh · L. Wei · S. Lonardi Computer Science & Engineering Department, University of California-Riverside, Riverside, CA 92521, USA e-mail: eamonn@cs.ucr.edu L. Wei e-mail: wli@cs.ucr.edu S. Lonardi e-mail: stelo@cs.ucr.edu