Automated Saturation Mitigation Controlled by
Deep Reinforcement Learning
Elkin Aguas
*†
, Anthony Lambert
*
, Gr´ egory Blanc
†
, Herv´ e Debar
†
*
Orange Labs, Chˆ atillon, France
{name.surname}@orange.com
†
T´ el´ ecom SudParis, Institut Polytechnique de Paris, Evry-Courcouronnes, France
{name.surname}@telecom-sudparis.eu
Abstract—Recent developments in orchestration and machine
learning have made network automation more feasible, allowing
the transition from error-prone and time-consuming manual
manipulations to fast and refined automated responses in areas
such as security and management. This article investigates the
capabilities of a deep reinforcement learning agent to learn how
to automatically share prefix announcements of an Autonomous
System to its neighbors, in order to mitigate undesired network
behaviors and therefore increase network resiliency and security.
Our work focuses on network saturation, tackling the problem
of network responsiveness in today’s massive content delivery
context. Results not only prove feasibility of such an agent, but
also demonstrate its ability to minimize traffic loss as well as the
number of actions to be performed by the automation process.
Index Terms—deep reinforcement learning, network, automa-
tion, security, management, network resiliency, saturation.
I. I NTRODUCTION
Automation and machine learning have already proven they
could improve detection and correction of security threats in
networks [1]–[3]. Recent developments also suggest they are
capable of optimizing security, management and performances
of networking infrastructures [4]–[6]. Taking advantage of
latest machine learning techniques, would especially allow au-
tomation, transitioning from error-prone and time-consuming
manual manipulations to faster and more refined automated
actions in responses to events. This could, for instance, alle-
viate the control problem that Carriers and Internet Service
Providers (ISPs) face in today’s massive content delivery
context and which can result in traffic congestion issues.
Indeed, Content Delivery Networks (CDNs), are not only
responsible for a large part of today’s traffic, but they also
totally control how content is delivered by their geographically
distributed cache servers. On the contrary, Carriers and ISPs
are unable to predict where content traffic will enter their
Autonomous System (AS). This lack of control increases their
vulnerability to traffic saturation over their inter-domain links
and can result in serious security issues (inter-domain sessions
termination, for instance) that put at risk the good, working
state of the network. They can however try to retain some
control by sharing the announcements of their prefixes to their
neighbors in order to try to influence the entry points of traffic
into their AS. But this is a very error-prone and complex task
to achieve manually, as it requires to find and maintain the
correct sharing of prefixes over time.
To this end, this paper evaluates the capability of a deep
reinforcement learning (DRL) agent to improve the dynamic
prefix load sharing of an AS in order to efficiently mitigate
saturation, an issue in today’s massive content delivery context.
More precisely, we (i) propose a generic network automation
architecture that automatically handles events that put at risk
its proper operation, (ii) design a DRL agent which controls
the choice of actions to be executed in case of saturation, (iii)
evaluate its performance under three scenarios and with two
different reward functions.
This paper is organized as follows: Section II provides a
description of the delivery and saturation issue. Section III
introduces our solution. Section IV details how we evaluate
its performance and discuss the results from our experiments.
Section V positions our work with respect to related works.
Section VI concludes the paper.
II. DELIVERY AND SATURATION I SSUE DESCRIPTION
CDNs are built as overlay networks of geographically
distributed servers. They implement complex and dynamic
delivery strategies to select which server will deliver a given
content to a given end-user at a given time. Such strategies
range from choosing the closest servers to users, or the ones
with the highest capacity, through optimizing the load on
the CDN or even the CDN provider transit costs. A direct
consequence is Carriers and Internet Service Providers (ISP)
turning into “dumb pipes”, as they have neither control nor
insight on how traffic is delivered to their customers. There-
fore, even if content delivery is legitimate traffic and cannot
be considered an attack, the massive and unpredictable traffic
shifts they trigger can have serious security issues. Indeed, they
can lead to inter-domain links saturation, preventing users to
access some services. Worse, control traffic can be lost, such as
BGP keep-alive messages, resulting in external BGP (eBGP)
sessions termination.
Fig. 1 depicts the root cause of such a situation leveraging
a basic example. ISP AS owns three prefixes: 2.0.0.0/24,
2.0.1.0/24 and 2.0.2.0/24. The content is replicated into three
possible caches: ISP Cache physically placed in the ISP AS’s
network, but seen as another AS from a connection point of
view (state-of-the-art practise); Direct Cache located in the 978-1-7281-6992-7/20/$31.00 ©2020 IEEE