0-7803-7016-3/01/$10.00 ©2001 IEEE
nanoProtean: Scalable System Software for a
Gigabit Active Router
David Craig, Hwangnam Kim, Raghupathy Sivakumar, Vaduvur Bharghavan, and Constantine
Polychronopoulos
Abstract— We introduce nanoProtean, a new router operating system
and execution environment that reduces system overhead, making it pos-
sible to process the packets produced by gigabit networks. The overhead
decreases as the offered load per packet increases due to the following fea-
tures in nanoProtean (i) a completely preemptable operating system, (ii)
efficient management of the system’s job queue, and (iii) system support
for fine-grain sharing of processing time amongst packets. These features
are a result of a novel integration of efficient thread scheduling for multi-
processors and interrupt handling.
Experimental analysis used to test our systems scalability uses a tech-
nique to emulate processing requests generated in real-time at 802.3z (giga-
bit) line speeds and greater. Our results demonstrate 2 Gbps routing with
MAE-East tables on two processors, and system overheads decreasing from
3.6 s per packet to 1.64 s per packet on one processor. By reducing system
overhead, we also demonstrate that nanoProtean enables active networking
in a router supporting gigabit connections.
Keywords— Scalable systems, active networks, preemptable context
switch, nonblocking priority scheduling
I. I NTRODUCTION
CTIVE networks offer promise to improve bandwidth uti-
lization compared to conventional packet routers. Active
routers are programmable on a per connection or even a per
packet basis. This programmability makes these routers very
flexible, capable of allocating their finite bandwidth and pro-
cessing capacity in an intelligent manner. Further, new proto-
cols that are developed after a router is deployed can be injected
on-the-fly to evolve router behavior.
Previous approaches in literature have investigated provid-
ing programmable services with Fast Ethernet line speeds
[1][2][3][4][5], that implement interfaces similar to that speci-
fied in the NodeOS interface[6] for programmable execution en-
vironments. In the design of nanoProtean we reconsider some of
the tradeoffs made in these projects. The paradigm shift is a goal
to route traffic produced at 802.3z line speeds with remotely in-
jected services. Our design is the blueprint for a programmable
router with the same routing capacity as commercially avail-
able routers such as Cisco’s 7500 line of routers with VIP-4 line
cards. While the line speeds are similar, the VIP-4 processors
are not capable of accepting and then executing remotely in-
jected code while the router continues to operate. However, the
processing capacity of the line cards are substantially greater
than the hardware used in our analysis due to special purpose
ASICs. The operating system design presented does not pre-
clude scheduling for a heterogeneous systems with line card-like
accelerators.
The flexibility vs. performance tradeoffs necessarily favor a
more efficient, multiprocessing execution environment. A giga-
bit line speed leaves a window of approximately 8k instructions
Coordinated Science Laboratory, University of Illinois at Champaign-Urbana,
E-mail: dcraig@csrd.uiuc.edu, hnkim, sivakumr, bharghav @crhc.uiuc.edu,
cdp@csrd.uiuc.edu
to process 1k packets on a fully utilized, dual-issue 500Mhz pro-
cessor. The ability to efficiently change resource scheduling is
also a serious concern, since a 1k packet arrives every 7.6 s, and
may require preempting the packet currently being processed.
These tradeoffs have been considered at the network level in the
Protean architecture[7], limiting the amount of processing per
packet and the amount of state the router is expected to maintain
without significantly compromising observed router flexibility.
nanoProtean is an operating system and execution environment
to implement the router specification with commodity hardware
for gigabit line speeds.
nanoProtean combines the use of a Shared Arena to expose
instantaneous resource scheduling and nonblocking data struc-
tures to improve system efficiency. Nonblocking data struc-
tures[8] have been used previously to implement efficient reg-
ular[9] and real-time operating system services[10] and prior-
ity scheduling[11]. The novel approach for routing is attach-
ing kernel schedulable entities (kernel threads) to every packet,
and an operating system design that efficiently provides thread-
ing without having to disable interrupts including periods dur-
ing a context switch. The thread allocation approach is a con-
scious choice to optimize for the programmable service path
through the router, or slow-path. The scheduling flexibility is
increased this choice, since a service can suspend the process-
ing of a packet with a single thread suspend call. A counter
argument that supports optimized software fast-paths is avoid-
ing the explicit thread allocation and initialization overhead. In-
stead, the fast-path can be optimized to where its execution time
is bounded and resource scheduling flexibility isn’t significantly
compromised. We assume that the availability of more com-
plex programmable services will be exploited to the exclusion
of fast-path code. Our design demonstrates a more simple ap-
proach where the operating system isn’t special cased for condi-
tions where a kernel stack is borrowed for interrupt processing.
To summarize the contributions of this work are:
A general purpose OS design tat is always interruptible, in-
cluding during a context switch.
An efficient priority queue that decreases scheduling overhead
as the number of running threads increase.
A router system that exploits this OS’s efficiency and scala-
bility to lineary increase the bandwidth processable with more
CPUs. (This with dynamically scheduled threads and not statis-
tically scheduled packets.)
A trace- and model-based emulation environment that recre-
ates in user-space the invocation of interrupt handlers. The ac-
curacy of these invocations is within the processor’s clock (i.e.
theoretically, 1.82ns in our experiments.)
The efficiency of the underlying OS enables sufficient rout-
ing capacity improvements that real programmable services can
51 IEEE INFOCOM 2001