Adaptive Normalization: A Novel Data Normalization Approach for Non-Stationary Time Series Eduardo Ogasawara, Leonardo C. Martinez, Daniel de Oliveira, Geraldo Zimbrão, Gisele L. Pappa and Marta Mattoso Abstract * - Data normalization is a fundamental preprocessing step for mining and learning from data. However, finding an appropriated method to deal with time series normalization is not a simple task. This is because most of the traditional normalization methods make assumptions that do not hold for most time series. The first assumption is that all time series are stationary, i.e., their statistical properties, such as mean and standard deviation, do not change over time. The second assumption is that the volatility of the time series is considered uniform. None of the methods currently available in the literature address these issues. This paper proposes a new method for normalizing non-stationary heteroscedastic (with non-uniform volatility) time series. The method, named Adaptive Normalization (AN), was tested together with an Artificial Neural Network (ANN) in three forecast problems. The results were compared to other four traditional normalization methods, and showed AN improves ANN accuracy in both short- and long-term predictions. I. INTRODUCTION Any application that deals with data requires a lot of time and effort for data preparation [1-3]. The main goal of data preparation is to guarantee the quality of the data before it is fed to any learning algorithm, and includes data cleaning, integration and transformation, and reduction. This paper focuses on data transformation methods, especially normalization, when dealing with time series data. The most common normalization methods used during data transformation include the min-max (where the data inputs are mapped into a predefined range, varying from 0 or −1 to 1), the z-score (where the values of an attribute A are normalized according to its mean and standard deviation), and the decimal scaling (where the decimal point of the values of an attribute A are moved according to its maximum absolute value). However, these methods are not always applicable to time series data. Consider the min-max and the decimal scaling methods, for instance. Their applicability depends on knowing the minimum and/or maximum values of a time series, which is not always possible. We can assume these values are present in a time series sample, but future data might be out of bounds. The z-score method, in contrast, is useful when the minimum and maximum values of an attribute are unknown, Eduardo Ogasawara, Daniel de Oliveira, Geraldo Zimbrão and Marta Mattoso are with the Department of Computer Science, Federal University of Rio de Janeiro – Brazil (email: {ogasawara, danielc, zimbrao, marta}@cos.ufrj.br). Leonardo Martinez and Gisele Pappa are with the Department of Computer Science, Federal University of Minas Gerais - Brazil (email: {leocm, glpappa}@dcc.ufmg.br). and can be applied to stationary time series [4,5], i.e., time series whose statistical properties, such as mean, variance, and autocorrelation, are constant over time. However, in the real world, most of the financial and economical time series are non-stationary [6]. In contrast with stationary time series, in non-stationary series data statistical properties do vary over time. Fig. 1 Monthly average exchange rate of U.S. Dollar to Brazilian Real time series In order to illustrate the concepts described above, Fig. 1 presents the monthly time series of average exchange rates of U.S. Dollar to Brazilian Real from January 1999 to December 2009. Complementary, Table I shows the mean, standard deviation, and minimum and maximum values of the series per year. Observe that these values change over time. For instance, the exchange rate mean in 1999 was 1.81, and this value rose to 3.08 in 2003. These variations shows that time series in Fig. 1 is non-stationary. TABLE I STATISTICS OF THE MONTHLY AVERAGE EXCHANGE RATE OF U.S. DOLLAR TO BRAZILIAN REAL TIME SERIES Year Mean Std. Deviation Min Max 1999 1.81 0.13 1.50 1.97 2000 1.83 0.07 1.74 1.96 2001 2.35 0.25 1.95 2.74 2002 2.92 0.56 2.32 3.81 2003 3.08 0.26 2.86 3.59 2004 2.93 0.12 2.72 3.13 2005 2.44 0.17 2.21 2.70 2006 2.18 0.04 2.13 2.27 2007 1.95 0.13 1.77 2.14 2008 1.83 0.28 1.59 2.39 2009 2.00 0.23 1.73 2.31 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 01/1999 06/1999 11/1999 04/2000 09/2000 02/2001 07/2001 12/2001 05/2002 10/2002 03/2003 08/2003 01/2004 06/2004 11/2004 04/2005 09/2005 02/2006 07/2006 12/2006 05/2007 10/2007 03/2008 08/2008 01/2009 06/2009 11/2009 US$ High volatility Low volatility