DAILY VOLUME FORECASTING USING HIGH FREQUENCY PREDICTORS Leandro G. M. Alvim Departamento de Informatica Pontificia Universidade Catolica do Rio de Janeiro Rio de Janeiro, RJ, Brazil email: leandrouff@gmail.com Cicero N. dos Santos Departamento de Informatica Pontificia Universidade Catolica do rio de Janeiro Rio de Janeiro, RJ, Brazil email: Ruy L. Milidiu Departamento de Informatica Pontificia Universidade Catolica do rio de Janeiro Rio de Janeiro, RJ, Brazil email: milidiu@inf.puc-rio.br ABSTRACT Daily volume is an important feature when it comes to fi- nancial market structure. Effective daily volume forecast- ing can help areas such as portfolio management and algo- rithm trading. Intraday updates of daily volume forecasts can explore high frequency data to provide more accurate forecasts. Previous work on daily volume forecasting usu- ally use Bayesian methods. In our work, we approach the problem of daily volume forecasting using the intraday in- formation. Forecasting is accomplished by the use of two machine learning predictors: Support Vector Regression (SVR) and Partial Least Squares (PLS). We empirically test our method using the top nine high liquidity Bovespa traded stocks. Our metrics are the percentage error and the relative error reduction against a naive strategy. Our results show that SVR and PLS provide accurate forecasts. More- over, the forecasting accuracy improves throughout the day as more intraday information is available. KEY WORDS Finance, Volume Forecasting, Machine Learning, PLS, SVR. 1 Introduction Investors who want to minimize their execution order mar- ket impact have extensively investigated the execution pro- cess and its consequences in exchanging amounts of as- sets. With the rising of the algorithm trading area, a lot of institutional investors are using computer based algo- rithms and pattern recognition techniques. It is quite rare to find human intervention in order execution. Nowadays, the total amount of orders executed by computer-based traders is large and increasing. According to cf. Chordia et al. (2008), algorithm trading has been reducing the av- erage amount of trades in the market and institutional in- vestors are forced to split their orders for better execution prices. One strategy is the Volume Weighted Average Price (VWAP) strategy. Its intention is to splits an specific num- ber of shares into smaller number of orders during the day, executing them at different prices. Such splitting proce- dure aims at operating close to the VWAP price. One of the interesting aspects of this strategy is that accurate vol- ume forecasting can lead to accurate VWAP execution. Portfolio management and asset allocation require ac- quisition or liquidation of positions, placing large amounts of orders that could change the price of an asset. This change is strictly associated with the transaction risk and can result in lower profits or higher losses. There is no simple solution for this problem and to minimize this, the investor can take into account: asset volume, financial mar- ket rules, volatility and asset correlations for example. Besides the importance of volume forecasting to al- gorithm trading and portfolio management, few works about volume can be found in the finance literature. In [Biakowski et al., 2005] and [Bialkowski et al., 2006] a new methodology is proposed. It consists of decompos- ing volume for intraday volume forecasting. Their work used ARIMA and SETAR models, which allowed signif- icant reduction in vwap orders risk. Their data consisted in forty stocks of the CAC40 index. In [Lean et al., 2008] a new kernel-based ensemble learning approach is pro- posed. They use econometric models and artificial in- telligence so as to predict China foreign trade volume. In [Lux and Kaizoji, 2007] the predictability of Japanese daily volume stocks and volatility are investigated. They compare ARFIMA and FIGARCH long-memory models to GARCH and ARIMA short-memory ones in order to pre- dict the volume of 100 days ahead. Our main contribuition is a dynamic volume model that uses high-frequency machine learning predictors for daily volume forecasting. In our work, we have updated our model aiming at reducing the daily forecasting error during the day. The updating mechanism model works using the already known intraday volume during the day. In order to forecast the volume during the day, we use a Support Vector Regression (SVR) and Partial Least Squares (PLS). For the experiments we use the Bovespa data set. This data set contains 15-min intraday volume of 9 stocks in which three of them consist of high liquidity stocks and the