Received January 18, 2021, accepted February 10, 2021, date of publication February 12, 2021, date of current version February 25, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3059187 Ensembling and Dynamic Asset Selection for Risk-Controlled Statistical Arbitrage SALVATORE M. CARTA 1 , (Member, IEEE), SERGIO CONSOLI 2 , ALESSANDRO SEBASTIAN PODDA 1 , DIEGO REFORGIATO RECUPERO 1 , AND MARIA MADALINA STANCIU 1 1 Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy 2 European Commission, Joint Research Centre (DG-JRC), Directorate A-Strategy, Work Programme and Resources, Scientific Development Unit, I-21027 Ispra, Italy Corresponding author: Sergio Consoli (sergio.consoli@ec.europa.eu) We would like to thank the Centre for Advanced Studies at the Joint Research Centre of the European Commission for guidance and support during the development of this research work. This work was also partially supported by the POR FESR 2014-2020 project: ‘‘AlmostAnOracle - AI and Big Data Algorithms for Financial Time Series Forecasting.’’ ABSTRACT In recent years, machine learning algorithms have been successfully employed to leverage the potential of identifying hidden patterns of financial market behavior and, consequently, have become a land of opportunities for financial applications such as algorithmic trading. In this paper, we propose a statistical arbitrage trading strategy with two key elements: an ensemble of regression algorithms for asset return prediction, followed by a dynamic asset selection. More specifically, we construct an extremely heterogeneous ensemble ensuring model diversity by using state-of-the-art machine learning algorithms, data diversity by using a feature selection process, and method diversity by using individual models for each asset, as well models that learn cross-sectional across multiple assets. Then, their predictive results are fed into a quality assurance mechanism that prunes assets with poor forecasting performance in the previous periods. We evaluate the approach on historical data of component stocks of the S&P500 index. By performing an in-depth risk-return analysis, we show that this setup outperforms highly competitive trading strategies considered as baselines. Experimentally, we show that the dynamic asset selection enhances overall trading performance both in terms of return and risk. Moreover, the proposed approach proved to yield superior results during both financial turmoil and massive market growth periods, and it showed to have general application for any risk-balanced trading strategy aiming to exploit different asset classes. INDEX TERMS Stock market forecast, statistical arbitrage, machine learning, ensemble learning. I. INTRODUCTION Statistical arbitrage trading, or StatArb for short, exploits some statistical patterns in the dynamics of security prices, thus obtaining, with a high probability, a return larger than the risk-free return. StatArb roots back from pairs trading strategy [2], and was first developed at Morgan Stanley by a quantitative trading group under the lead of Nunzio Tartaglia in the mid-1980s on Wall Street [3]. Pairs trading, a simplified form of StatArb, involves forming portfolios of two related stocks with relatively close pricing. The intuition behind pairs trading is to exploit the spread of expected returns of financial assets. When using such a trading strategy, investors go long on the underpriced asset with the highest expected return and The associate editor coordinating the review of this manuscript and approving it for publication was Bohui Wang . short the portfolio of assets with the lowest expected return. These strategies typically tend to make a large number of individual independent trades with a positive expected return, thereby reducing the risk of the strategy. The arbitrage oppor- tunities exist as a consequence of the market inefficiency and the profits are realized by taking trading positions when the mispricing of the assets correct themselves in the future. Moreover, because the spread between assets’ prices is con- sidered to be uncorrelated with market returns, pairs trading and, by extension, StatArb, are market-neutral strategies. On a high level, StatArb implies automatically trading a set of assets that construct a portfolio [4] and comprises two phases: (i) the scoring phase, where each asset is assigned a relevance score, with high scores indicating assets that should be held long and low scores indicating assets that are candidates for short operations; and (ii) the risk reduction 29942 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 9, 2021