Adaptive random forests for data stream regression Heitor Murilo Gomes 1 , Jean Paul Barddal 1,2 , Luis Eduardo Boiko 2 , Albert Bifet 1 1- Department of Computer Science and Networks (INFRES), T´ el´ ecom ParisTech Universit´ e Paris-Saclay, Paris, France 2- Programa de P´ os-Gradua¸ c˜ ao em Inform´ atica (PPGIa) Pontif´ ıcia Universidade Cat´ olica do Paran´ a, Curitiba, Brazil Abstract. Data stream mining is a hot topic in the machine learning community that tackles the problem of learning and updating predictive models as new data becomes available over time. Even though several new methods are proposed every year, most focus on the classiﬁcation task and overlook the regression task. In this paper, we propose an adaptation to the Adaptive Random Forest so that it can handle regression tasks, namely ARF-Reg. ARF-Reg is empirically evaluated and compared to the state-of-the-art data stream regression algorithms, thus highlighting its applicability in diﬀerent data stream scenarios. 1 Introduction Data stream mining is an important topic in the machine learning community. It tackles the problem of learning and updating learning models as new data becomes available over time. Even though several new methods are proposed every year, most focus on the classiﬁcation task and overlook the regression task. Important examples of regression include, for instance, temperature and precip- itation forecasts, stock market and household price predictions. Furthermore, the data distribution of the examples aforementioned may be ephemeral in the sense that it can change over time. For instance, the temperature and precipi- tation rates of a region may change due to unexpected environmental accidents, or the prices of stocks may vertiginously decrease if a company is found to be amidst corruption schemes, and so forth. In this paper, we adapt the Adaptive Random Forest (ARF) learner pre- sented in [1] to the regression task, hereafter referred to as ARF-Reg. ARF-Reg was implemented in the Massive Online Analysis (MOA) framework and it will be made publicly available for further studies on the area. The remainder of this paper is divided as follows. Section 2 describes the data stream regression task and its challenges. Section 3 overviews related works. Section 4 describes the proposed method, which is later evaluated in Section 5. Finally, Section 6 concludes this paper and reports envisioned future works. 2 Problem Deﬁnition Despite the impressive amount of eﬀort put on data stream mining, most of the works focus on classiﬁcation and overlooked both regression and clustering ESANN 2018 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 25-27 April 2018, i6doc.com publ., ISBN 978-287587047-6. Available from http://www.i6doc.com/en/. 267