J. Wang, X. Liao, and Z. Yi (Eds.): ISNN 2005, LNCS 3498, pp. 397402, 2005. ' Springer-Verlag Berlin Heidelberg 2005 Internet Traffic Prediction by W-Boost: Classification and Regression Hanghang Tong 1 , Chongrong Li 2 , Jingrui He 1 , and Yang Chen 1 1 Department of Automation, Tsinghua University, Beijing 100084, China {walkstar98,hejingrui98}@mails.tsinghua.edu.cn 2 Network Research Center of Tsinghua University, Beijing 100084, China licr@cernet.edu.cn Abstract. Internet traffic prediction plays a fundamental role in network de- sign, management, control, and optimization. The self-similar and non-linear nature of network traffic makes highly accurate prediction difficult. In this pa- per, we proposed a new boosting scheme, namely W-Boost, for traffic predic- tion from two perspectives: classification and regression. To capture the non- linearity of the traffic while introducing low complexity into the algorithm, stump and piece-wise-constant function are adopted as weak learners for clas- sification and regression, respectively. Furthermore, a new weight update scheme is proposed to take the advantage of the correlation information within the traffic for both models. Experimental results on real network traffic which exhibits both self-similarity and non-linearity demonstrate the effectiveness of the proposed W-Boost. 1 Introduction Internet traffic prediction plays a fundamental role in network design, management, control, and optimization [12]. Essentially, the statistics of network traffic itself de- termines the predictability of network traffic [2], [12]. Two of the most important discoveries of the statistics of Internet traffic over the last ten years are that Internet traffic exhibits self-similarity (in many situations, also referred as long-range depend- ence) and non-linearity. Since Will E. Lelands initiative work in 1993, many re- searchers have dedicated themselves to proving that Internet traffic is self-similar [10]. On the other hand, Hansegawa et al in [6] demonstrated that Internet traffic is non-linear by using surrogate method [16]. The discovery of self-similarity and non- linearity of network traffic has brought challenges to traffic prediction [12]. In the past several decades, many methods have been proposed for network traffic prediction. To deal with the self-similar nature of network traffic, the authors in [15] proposed using FARIMA since FARIMA is a behavior model for self-similar time series [4]; the authors in [19] proposed predicting in wavelet domain since wavelet is a natural way to describe the multi-scale characteristic of self-similarity. While these methods do improve the performance of prediction for self-similar time series, they are both time-consuming. To deal with the non-linear nature of network traffic, Arti- This work is supported by National Fundamental Research Develop (973) under the contract 2003CB314805.