International Journal of Computer Applications (0975 – 8887) Volume 113 – No. 16, March 2015 10 Trend based Approach for Time Series Representation Sagar S. Badhiye Department of CT, YCCE Nagpur, India Kalyani S. Hatwar Department of CT, YCCE Nagpur, India P. N. Chatur, Ph.D Department of CSE, GCOE Amravati, India ABSTRACT Time series representation is one of key issues in time series data mining. Time series is simply a sequence of number collected at regular interval over a period of time and obtained from scientific and financial applications. The nature of time series data shows characteristics like large data size, high dimensional and necessity to update continuously. With the help of suitable choice of representation it will address high dimensionality issues and improve the efficiency of time series data mining. Symbolic Piecewise Trend Approximation is proposed to improve efficiency of time series data mining in high dimensional large databases. SPTA represents time series in trends form and obtained its values. Sign of value indicate changing direction and magnitude indicates degree of local trend. Depending on the trend of time series, it is segmented into samples of different size which are approximated by the ratio between first and last points within the segment. Each segment then represented by alphabet. The time series is thus represented as sequence of alphabets thus reducing its dimension. Validate SPTA with naïve based classification method. Keywords Data mining, Time series representation, Time series, piecewise trends Approximation 1. INTRODUCTION Time series is one of key issue in Data Mining [1]. It is simply a sequence of numbers collected at regular intervals over a period of time or Collection of observations indexed by the date of each observation. “A time series may be defined as a collection of reading belonging to different time periods of some economic or composite variables” by Ya-Lun-Chau . Mathematical representation of time series is, at some fixed interval h, at times 1 , 2 ,…, N may be denoted by x( 1 ), x( 2 ),…, x( N ). Data mining techniques such as clustering, classification, association rule, etc.[6] are applied on time series data to retrieve useful information and knowledge from these kinds of databases. There are various kinds of time series data related research like finding similar time series, dimensionality reduction, segmentation and subsequence searching in time series and many researchers are working on time series data analysis. Dimensionality affects destructively impacts on the result of time series data mining and a very costly querying process, so need to overcome the problem of high dimensionality. The way of time series representation is used to reduce the dimension of the original time series data. There are many method used to reduce high dimensionality of time series representations are sampling, Piecewise Aggregate Approximation (PAA)[2], Dynamic Time Warping (DWT)[3], Symbolic Aggregate Approximation (SAX)[4], Piecewise Linear Approximation (PLA)[11], Piecewise Trend approximation (PTA)[4], Piecewise Cloud Approximation (PWCA)[5]. Transforming the time series into any of the above representations it is possible to measure the similarity or distance between two time series in the reduced space. Thus this many techniques used to address the issue of dimensionality reduction has been designed by various researchers and it is observed that there is still some scope [5] of increasing the efficiency of time series analysis by designing an efficient time series representation technique. 2. RELATED WORK In this paper various dimension reduction techniques are studied, these techniques are used for the process of reducing the number of samples present in the time series. The need of addressing the problem of high dimensionality is because of its adverse effect on result of time series data mining. Query accuracy and efficiency are inversely proportional to the dimensions. Tak-chung Fu have discussed various kinds of time series data related research. Time series is a collection of data stored in financial, educational, medical and meteorological database. Data mining techniques such as clustering, classification, association rule, etc. are applied on time series data to retrieve useful information and knowledge from it. Various traditional dimensionality reduction techniques are explained in this paper like sampling, piecewise aggregate approximation, piecewise linear approximation, symbolic aggregate approximation, discrete Fourier transform, etc. Dimension reduction can be done effectively by representing time series in various ways i.e. the number of data point of the original data is reduced [1]. The simplest method for representing time series is sampling (Astrom, 1969). In this representation method, a rate of m/n is used, where m is the length of a time series P and n is the dimension after dimensionality reduction. Drawback of this method is that distorting the shape of sampled time series, if the sampling rate is too low [1]. Another advanced method is to use the average (mean) value of each segment to represent the corresponding set of data points in time series. Piecewise aggregate approximation (PAA) in which segmented mean of starting and ending data points of each segment is consider. For example time series of n points and having p segments (p > n), then its PAA representation is n/p. Keogh et al.(2000a) investigate an extended version called adaptive piecewise constant approximation (APCA) in which length of each segment is not fixed but adaptive to the shape of the series. A major difference between PAA and APCA is that APCA can identify segment of variable length [10]. To reduce the dimension of time series data, another approach is to represent a time series with straight lines i.e. Piecewise Linear Approximation(PLA), in which the approximating line for the subsequence P(pi, …,pj) is simply the line connecting the data points pi and pj, i.e. the end point of consecutive segments, giving the piecewise approximation with connected lines. A piecewise linear function is a function composed of some number of linear segments defined over an equal number of intervals, generally of equal size. Advantage of this method is that reducing the dimension by preserving the salient points is a promising method. These points are called as perceptually important points (PIP). With the time series P,