Performance of Modeling Time Series Using Nonlinear Autoregressive with eXogenous input (NARX) in the Network Traffic Forecasting Haviluddin Faculty of Mathematics and Natural Science Dept. of Computer Science, Universitas Mulawarman, Indonesia haviluddin@unmul.ac.id Rayner Alfred Faculty of Computing and Informatics, Dept. of Computer Science, Universiti Malaysia Sabah, Malaysia ralfred@ums.edu.my Abstract A time-series data analysis and prediction tool for learning the network traffic usage data is very important in order to ensure an acceptable and a good quality of network services can be provided to the organization (e.g., university). This paper presents the modeling using a nonlinear autoregressive with eXogenous input (NARX) algorithm for predicting network traffic datasets. The best performance of NARX model, based on the architecture 189:31:94 or 60%:10%:30%, with delay value of 5, is able to produce a pretty good with Mean Squared Error of 0.006717 with the value of correlation coefficient, r, of 0.90764 respectively. In short, the NARX technique has been proven to learn network traffic effectively with an acceptable predictive accuracy result obtained. Keywords—NARX; network traffic; MSE; correlation coefficient I. INTRODUCTION Time series analysis tools that are used for modelling and forecasting time series datasets are widely used in various fields including economic field (i.e. business, finance, foreign exchange, and stock problems), investment, engineering, energy, internet, and network traffic. Indeed, an accurate prediction ability is highly required in order to assist the process of decision making. In the literature review, numerous strategies have been established in the general framework of time series prediction. These techniques can be grouped into two main categories: statistical and machine learning (ML) methods. There are several types of methods that are derived from the statistics such as autoregressive (AR), moving average (MA), autoregressive moving-average (ARMA), autoregressive integrated moving-average (ARIMA), generalized autoregressive conditional heteroskedasticity (GARCH), and seasonal autoregressive integrated moving- average (SARIMA). Statistical methods are reliable enough to be used in forecasting, if the amount of data is not too much with linear data types. Meanwhile, the results of forecasting have been less accurate when using a lot of data, due to the fact that the mathematical model generated is quite complicated, and difficult to be implemented by using a nonlinear data type [1-3]. On the other hand, machine learning (ML) has been also besides these statistical models. For instance, the Artificial Neural Networks (ANN) is one of the ML methods, in which it is widely used for analyzing and forecasting time series data in the past four decades. Additionally, many researchers have been using ANN widely as a time series analysis method to solve problems due to its efficiency in solving linear and nonlinear problems [4-6]. Among the ANN extension methods include the multilayer perceptron’s with back propagation (BP), recurrent neural networks (RNN), and a radial basis function (RBF) neural network, that can provide efficient and accurate forecasting, also being able to analyze especially by using nonlinear data as a representation of the real world [1, 2, 7-12]. The motivation of this paper is to present a topology and training scheme of a neural network that is able to forecast the network traffic with some degree of accuracy using a one-step ahead prediction. It is hoped that this paper can provide insights to support network engineer management in providing an efficient bandwidth traffic control management for the campus communities. This paper will study the Nonlinear Auto Regressive with eXogenous input neural network (NARX) model, in order to address the issue of time series data that has non-linear characteristics. Section II describes the methodology used in this work. Section III outlines the experimental setup. Section IV presents the analysis and discussion results, and Section V concludes this paper. II. METHODOLOGY In this section, related works on the general network traffic prediction models will be presented, including the time series analysis performed by using the NARX model. A. Time Series A time series dataset is a dataset that consists of observations ordered in time. In principle, time series model is used to predict the values of data based on the data [13]. In this study, the time series dataset is obtained from the ICT server of Universitas Mulawarman. Each network traffic data was captured by using the CACTI software from 20 – 26 June 2013 (314 samples series data). The dataset and plot dataset are shown in Table I and Fig. 1. 978-1-4799-8386-5/15/$31.00 ©2015 IEEE 164 2015 International Conference on Science in Information Technology (ICSITech)