978-1-5090-5548-7/16/$31.00 ©2016 IEEE
Statistical and Machine Learning Approach in
Forex Prediction Based on Empirical Data
Sitti Wetenriajeng Sidehabi
Department of Electrical Engineering
Politeknik ATI Makassar
Makassar, Indonesia
tenri616@gmail.com
Indrabayu
1
, Sofyan Tandungan
2
Department of Informatics Engineering
Universitas Hasanuddin
Makassar, Indonesia
indrabayu@unhas.ac.id, standungan@gmail.com
Abstract—This study proposed a new insight in comparing
common methods used in predicting based on data series i.e
statistical method and machine learning. The corresponding
techniques are use in predicting Forex (Foreign Exchange) rates.
The Statistical method used in this paper is Adaptive Spline
Threshold Autoregression (ASTAR), while for machine learning,
Support Vector Machine (SVM) and hybrid form of Genetic
Algorithm-Neural Network (GA-NN) are chosen. The comparison
among the three methods accurate rate is measured in root mean
squared error (RMSE). It is found that ASTAR and GA-NN
method has advantages depend on the period time intervals.
Keywords—forex, prediction, ASTAR. GA-NN, SVM, RMSE
I. INTRODUCTION
Forex (Foreign Exchange) is a type of transaction
where a party obtains some units in one currency to buy
proportion amount in another currency. This exchange is
usually conducted in pair currency. The most popular pair and
trade worldwide is Euro vs. US Dollar (EUR / USD). In Forex,
there are two kinds of analysis, fundamental and technical
analysis. Fundamental term refer to the movement of the market
in association with news or factors that can affect a country's
economy, while technical assessment is mainly observed the
supply demand trend through market movements by reading
charts and indicators of ongoing market price.
In most cases, Forex rates technical prediction are based on
statistical charts and machine learning. It is always interesting to
measure up both of this procedures in data series prediction,
which none of both scheme is likely better than other for each
case [1]. A statistical modelling and forecasting using Auto-
Regressive Integrated Moving Average (ARIMA) for Gold
Bullion Coin has shown promising result with a MAPE (mean
absolute percentage error) within 10% [2]. Artificial Intelligence
has been researched as well as statistical and machine learning.
With a novel approach for efficient weekly market price
forecasting, has come to an outstanding result with 99.62% of
accurate rate[3]. Recently, A hybrid methods of Artificial
Intelligence also fulfill the 30 minutes time frame prediction [4].
This breakthrough allows a practical application for traders in
gaining profit within the time frame with all the price indicators
i.e. open, close, high and low are predicted as well. These
previous research in price forecasting are conducted thoroughly
on single method. This study aim to apply Adaptive Spline
Threshold Autoregression (ASTAR), combination of Genetic
Algorithm-Neural Network (GA-NN) and Support Vector
Machine (SVM) to Forex rates prediction and provide a
computational comparison of the performance of these
techniques.
A. Adaptive Spline Threshold Autoregression (ASTAR)
ASTAR is a model obtained from modeling nonlinear
time series threshold in Multivariate Adaptive Regression
Spline (MARS) method where the predictor is the lagged value
of time series data [5]. ASTAR has the ability to generate
continuous models with underlying limit cycles when the time
series data indicate periodic behaviour. Similar to MARS,
ASTAR structured by two complementary algorithm. ASTAR
has two stepwise algorithm, which help to get basis functions
for model and to get the best appropriate model. First step is
forward stepwise algorithm, the model obtained has a very
complex structure. Second step is backward stepwise algorithm,
basis function in the model from the previous step is turn to
reach optimum model. ASTAR model example is as follows:
=ݐ+∅
ଵ
ሺ
௧ௗଵ
− ݐ
ଵ
ሻ
ା
+∅
ଶ
ሺ
௧ௗଶ
− ݐ
ଶ
ሻ
ା
+∅
ଷ
ሺ
௧ௗଵ
− ݐ
ଵ
ሻሺ
௧ௗଶ
− ݐ
ଶ
ሻ
ା
+⋯+ ߝ
௧
(1)
where:
c = constants
∅ = coefficient
t
1
, t
2
= threshold of each variable Z
t-d1
, and
Z
t-d2, d1, d2
= lagged predictor variable.
B. Support Vector Machine (SVM)
Support Vector Machine (SVM) is known as a
machine learning that uses a pair of input and output data in the
form of the desired target. The concept of SVM can be
explained simply as the search for the best hyper plane which
serves as a separator of two classes in the input space [6].
SVM was developed by Boser, Guyon, Vapnik, and
was first presented in 1992 at the Annual Workshop on
Computational Learning Theory. The basic concept of SVM is
actually a harmonious combination of computational theories
that have existed decades earlier, such as margin hyperplane,