Automated Saturation Mitigation Controlled by Deep Reinforcement Learning Elkin Aguas *† , Anthony Lambert * , Gr´ egory Blanc , Herv´ e Debar * Orange Labs, Chˆ atillon, France {name.surname}@orange.com el´ ecom SudParis, Institut Polytechnique de Paris, Evry-Courcouronnes, France {name.surname}@telecom-sudparis.eu Abstract—Recent developments in orchestration and machine learning have made network automation more feasible, allowing the transition from error-prone and time-consuming manual manipulations to fast and refined automated responses in areas such as security and management. This article investigates the capabilities of a deep reinforcement learning agent to learn how to automatically share prefix announcements of an Autonomous System to its neighbors, in order to mitigate undesired network behaviors and therefore increase network resiliency and security. Our work focuses on network saturation, tackling the problem of network responsiveness in today’s massive content delivery context. Results not only prove feasibility of such an agent, but also demonstrate its ability to minimize traffic loss as well as the number of actions to be performed by the automation process. Index Terms—deep reinforcement learning, network, automa- tion, security, management, network resiliency, saturation. I. I NTRODUCTION Automation and machine learning have already proven they could improve detection and correction of security threats in networks [1]–[3]. Recent developments also suggest they are capable of optimizing security, management and performances of networking infrastructures [4]–[6]. Taking advantage of latest machine learning techniques, would especially allow au- tomation, transitioning from error-prone and time-consuming manual manipulations to faster and more refined automated actions in responses to events. This could, for instance, alle- viate the control problem that Carriers and Internet Service Providers (ISPs) face in today’s massive content delivery context and which can result in traffic congestion issues. Indeed, Content Delivery Networks (CDNs), are not only responsible for a large part of today’s traffic, but they also totally control how content is delivered by their geographically distributed cache servers. On the contrary, Carriers and ISPs are unable to predict where content traffic will enter their Autonomous System (AS). This lack of control increases their vulnerability to traffic saturation over their inter-domain links and can result in serious security issues (inter-domain sessions termination, for instance) that put at risk the good, working state of the network. They can however try to retain some control by sharing the announcements of their prefixes to their neighbors in order to try to influence the entry points of traffic into their AS. But this is a very error-prone and complex task to achieve manually, as it requires to find and maintain the correct sharing of prefixes over time. To this end, this paper evaluates the capability of a deep reinforcement learning (DRL) agent to improve the dynamic prefix load sharing of an AS in order to efficiently mitigate saturation, an issue in today’s massive content delivery context. More precisely, we (i) propose a generic network automation architecture that automatically handles events that put at risk its proper operation, (ii) design a DRL agent which controls the choice of actions to be executed in case of saturation, (iii) evaluate its performance under three scenarios and with two different reward functions. This paper is organized as follows: Section II provides a description of the delivery and saturation issue. Section III introduces our solution. Section IV details how we evaluate its performance and discuss the results from our experiments. Section V positions our work with respect to related works. Section VI concludes the paper. II. DELIVERY AND SATURATION I SSUE DESCRIPTION CDNs are built as overlay networks of geographically distributed servers. They implement complex and dynamic delivery strategies to select which server will deliver a given content to a given end-user at a given time. Such strategies range from choosing the closest servers to users, or the ones with the highest capacity, through optimizing the load on the CDN or even the CDN provider transit costs. A direct consequence is Carriers and Internet Service Providers (ISP) turning into “dumb pipes”, as they have neither control nor insight on how traffic is delivered to their customers. There- fore, even if content delivery is legitimate traffic and cannot be considered an attack, the massive and unpredictable traffic shifts they trigger can have serious security issues. Indeed, they can lead to inter-domain links saturation, preventing users to access some services. Worse, control traffic can be lost, such as BGP keep-alive messages, resulting in external BGP (eBGP) sessions termination. Fig. 1 depicts the root cause of such a situation leveraging a basic example. ISP AS owns three prefixes: 2.0.0.0/24, 2.0.1.0/24 and 2.0.2.0/24. The content is replicated into three possible caches: ISP Cache physically placed in the ISP AS’s network, but seen as another AS from a connection point of view (state-of-the-art practise); Direct Cache located in the 978-1-7281-6992-7/20/$31.00 ©2020 IEEE