iBGP2: a scalable iBGP redistribution mechanism leading to optimal routing Marc-Olivier Buob Nokia Bell Labs marc-olivier.buob@nokia.com Anthony Lambert Orange Labs anthony.lambert@orange.com Steve Uhlig Queen Mary University of London steve.uhlig@qmul.ac.uk Abstract—The Internet is made of almost 50, 000 ASes ex- changing routing information thanks to BGP. Inside each AS, information is redistributed via iBGP sessions. This allows each router to map a destination exterior to the AS with a given egress point. The main redistribution mechanisms used today, (iBGP full mesh, Route Reflectors and BGP confederations), either guarantee selection of the best egress point or enhance scalability, but not both. In this paper, we propose a new way to perform iBGP redistribution in an AS based on its IGP topology, conciliating optimality in route selection and scalability. Our contribution is threefold. First, we demonstrate the tractability of our approach and its benefits. Second, we provide an open-source implementation of our mechanism based on Quagga. Third, we illustrate the feasibility of our approach through simulations performed under ns-3 and compare its performance with full mesh and Route Reflection. I. BACKGROUND A. Context The Internet is made of about 50,000 interconnected Au- tonomous Systems (ASes). Each AS consists in a set of networks and routers under the control of a given adminis- trative authority (e.g., carrier, Internet Service Provider (ISP), Content Provider). To ensure reachability in the Internet, ASes exchange routing information about networks they can reach in the Internet. This is achieved by establishing exterior Border Gateway Protocol (eBGP) sessions [25] between the AS border routers (ASBR) of neighboring ASes. The routing information learned by ASBRs is then redistributed inside the AS through internal Border Gateway Protocol (iBGP) sessions, established between the routers of the AS. This way, BGP populates routing tables of network equipment inside the AS with routes to destinations external to the AS. The key idea consists in selecting for each external destination an egress point called BGP next-hop. Each network equipment has to be able to reach BGP next-hops, which is usually achieved through the use of an Interior Gateway Protocol (IGP) such as OSPF [20] or IS- IS [22]. In such protocols, weights are assigned to physical links. Adjacency states are flooded between routers across the IGP network, enabling routers to build a map of the network and compute their shortest paths to any interior destination using Dijkstra’s algorithm. In general, any exterior destination can be reached through several egress points. BGP routers therefore have to elect a single best egress point among all candidates. They do so by running their BGP decision process (see Figure 1). The Fig. 1. Steps of the BGP decision process considered in this paper. Some steps are omitted for the sake of simplicity and without loss of generality. first steps of the BGP decision process are concerned with interdomain metrics: AS economic policy and interdomain path length. The next steps focus on intradomain aspects: they allow to implement either cold-potato (using MED attribute) or hot-potato routing strategies (by selecting the closest egress point in terms of IGP costs). BGP is an incremental protocol, i.e., a router only an- nounces its preferred path to its neighbors for each destination. Moreover, routing information learned through an iBGP ses- sion should not be readvertised through another iBGP session. As a consequence, depending on the iBGP network topology, there is no guarantee that a router will learn its best possible egress point to a destination. The simpler solution to overcome this issue consists in establishing an iBGP session between every pair of routers, i.e., use an iBGP full mesh. Each router hence learns the best route selected by every other router for every destination. However, such an approach has poor scalability, in terms of configuration overhead, number of messages (each routing change is notified to all routers), and memory consumption (each router maintains an Adj-RIB-in and an Adj-RIB-out per neighbor). Consequently, an iBGP full mesh is traditionally only used in small ASes. Large ASes, on the other hand, rely on approaches such as Route Reflection [4] or BGP confederations [28]. A Route Reflector (RR) is a router which is allowed to readvertise some iBGP routing information. More precisely, some of the RR’s iBGP peers can be configured as RR clients. A RR is allowed to redistribute a route learned from a RR client to every peer and non-client routes only to RR clients. BGP confederations consist in splitting an AS into a set of sub-ASes, exchanging routing information through eBGP. Those solutions allow to design scalable network topologies, but can lead to arbitrary filtering of routing information, and potentially to issues in route selection, e.g., oscillations [2], [13], non-optimality [10],