978-1-5090-5548-7/16/$31.00 ©2016 IEEE Statistical and Machine Learning Approach in Forex Prediction Based on Empirical Data Sitti Wetenriajeng Sidehabi Department of Electrical Engineering Politeknik ATI Makassar Makassar, Indonesia tenri616@gmail.com Indrabayu 1 , Sofyan Tandungan 2 Department of Informatics Engineering Universitas Hasanuddin Makassar, Indonesia indrabayu@unhas.ac.id, standungan@gmail.com Abstract—This study proposed a new insight in comparing common methods used in predicting based on data series i.e statistical method and machine learning. The corresponding techniques are use in predicting Forex (Foreign Exchange) rates. The Statistical method used in this paper is Adaptive Spline Threshold Autoregression (ASTAR), while for machine learning, Support Vector Machine (SVM) and hybrid form of Genetic Algorithm-Neural Network (GA-NN) are chosen. The comparison among the three methods accurate rate is measured in root mean squared error (RMSE). It is found that ASTAR and GA-NN method has advantages depend on the period time intervals. Keywords—forex, prediction, ASTAR. GA-NN, SVM, RMSE I. INTRODUCTION Forex (Foreign Exchange) is a type of transaction where a party obtains some units in one currency to buy proportion amount in another currency. This exchange is usually conducted in pair currency. The most popular pair and trade worldwide is Euro vs. US Dollar (EUR / USD). In Forex, there are two kinds of analysis, fundamental and technical analysis. Fundamental term refer to the movement of the market in association with news or factors that can affect a country's economy, while technical assessment is mainly observed the supply demand trend through market movements by reading charts and indicators of ongoing market price. In most cases, Forex rates technical prediction are based on statistical charts and machine learning. It is always interesting to measure up both of this procedures in data series prediction, which none of both scheme is likely better than other for each case [1]. A statistical modelling and forecasting using Auto- Regressive Integrated Moving Average (ARIMA) for Gold Bullion Coin has shown promising result with a MAPE (mean absolute percentage error) within 10% [2]. Artificial Intelligence has been researched as well as statistical and machine learning. With a novel approach for efficient weekly market price forecasting, has come to an outstanding result with 99.62% of accurate rate[3]. Recently, A hybrid methods of Artificial Intelligence also fulfill the 30 minutes time frame prediction [4]. This breakthrough allows a practical application for traders in gaining profit within the time frame with all the price indicators i.e. open, close, high and low are predicted as well. These previous research in price forecasting are conducted thoroughly on single method. This study aim to apply Adaptive Spline Threshold Autoregression (ASTAR), combination of Genetic Algorithm-Neural Network (GA-NN) and Support Vector Machine (SVM) to Forex rates prediction and provide a computational comparison of the performance of these techniques. A. Adaptive Spline Threshold Autoregression (ASTAR) ASTAR is a model obtained from modeling nonlinear time series threshold in Multivariate Adaptive Regression Spline (MARS) method where the predictor is the lagged value of time series data [5]. ASTAR has the ability to generate continuous models with underlying limit cycles when the time series data indicate periodic behaviour. Similar to MARS, ASTAR structured by two complementary algorithm. ASTAR has two stepwise algorithm, which help to get basis functions for model and to get the best appropriate model. First step is forward stepwise algorithm, the model obtained has a very complex structure. Second step is backward stepwise algorithm, basis function in the model from the previous step is turn to reach optimum model. ASTAR model example is as follows:  =ݐ+∅ ଵ ሺ ௧ௗଵ − ݐ ଵ ሻ ା +∅ ଶ ሺ ௧ௗଶ − ݐ ଶ ሻ ା +∅ ଷ ሺ ௧ௗଵ − ݐ ଵ ሻሺ ௧ௗଶ − ݐ ଶ ሻ ା +⋯+ ߝ ௧ (1) where: c = constants ∅ = coefficient t 1 , t 2 = threshold of each variable Z t-d1 , and Z t-d2, d1, d2 = lagged predictor variable. B. Support Vector Machine (SVM) Support Vector Machine (SVM) is known as a machine learning that uses a pair of input and output data in the form of the desired target. The concept of SVM can be explained simply as the search for the best hyper plane which serves as a separator of two classes in the input space [6]. SVM was developed by Boser, Guyon, Vapnik, and was first presented in 1992 at the Annual Workshop on Computational Learning Theory. The basic concept of SVM is actually a harmonious combination of computational theories that have existed decades earlier, such as margin hyperplane,