Performance of Exhaustive Matching Algorithms
for Input-Queued Switches
Yoohwan Kim
Electrical Engineering and Computer Science Department
Case Western Reserve University
Cleveland, OH 44106
H. Jonathan Chao
Electrical and Computer Engineering Department
Polytechnic University
Brooklyn, NY 11201
Abstract-- Virtual output queue (VOQ) architecture is
commonly used for avoiding head-of-line blocking in input-
queued switches. Many algorithms have been developed for
transferring the cells from the VOQs to the output ports.
Traditional iterative algorithms such as iSLIP and DRRM,
achieve 100% throughput under uniform traffic. But under non-
uniform traffic, throughput drops significantly. Recently, a new
paradigm of exhaustive matching (EM) has been introduced for
handling non-uniform traffic while preserving the complexity of
traditional iterative algorithms. In EM, a VOQ is served
continuously until it becomes empty. Only the input ports that
have finished serving a VOQ look for a new match. This strategy
produces very good throughput and delay performance in
uniform and non-uniform traffic. However under some traffic
patterns, there is a starvation problem when a VOQ occupies an
output port for an extended period of time. This problem can be
eliminated by providing a priority service for a VOQ that has
waited an excessively long time. The resulting algorithm,
prioritized EM (PEM), eliminates starvation and achieves very
high throughput for many traffic patterns.
Keywords—Switch, Scheduling, Matching, Virtual Output
Queueing, Exhaustive service
I. INTRODUCTION
Input-queued switches are preferred for high-speed packet
switching because the memory speed remains comparable to
the input line rate[10]. However, the throughput of input-
queued switches is limited to 58.6% due to head-of-line
(HOL) blocking[5]. A common solution is to use virtual
output queue (VOQ) architecture[1], where each input
maintains separate queues for each output as shown in
Fig. 1. The incoming cells are queued up in VOQs at input
ports. Cells are transferred to output ports based on a specific
scheduling algorithm. Many proposals were made in
scheduling the cell transfer in a VOQ switch. Those
scheduling algorithms are generally grouped into maximum
weight matching (MWM) and maximum size matching
(MSM)[6]. Although maximum weight matching guarantees
100% throughput for all traffic patterns[3][9], it is considered
impractical due to high complexity (O(N
3
logN)). Maximum
size matching has lower but still high complexity (O(N
5/2
)), so
its approximation algorithms are used in practice. Those
approximation algorithms achieve maximal match by multiple
iterations.
From an implementation point of view, a matching
algorithm can be grouped as 3-phase or 2-phase[4]. A 3–phase
algorithm follows these basic steps.
N x N
Non-blocking
Switch
Outputs
Inputs
1
2
1
2
N
N
.
.
.
.
.
.
.
.
VOQ (1, 1)
VOQ (1, 2) .
.
.
VOQ (1, N)
VOQ (1, 1)
VOQ (1, 2) .
.
.
VOQ (1, N)
Fig. 1: VOQ architecture in input-queued switch
1. Request: Each input sends a request to every output
for which it has a queued cell.
2. Grant: Output chooses one input and sends a grant.
3. Accept: Input chooses one grant.
Different Algorithms are used depending on how
contentions are resolved at step 2 for multiple requests and at
step 3 for multiple grants. All round-robin matching based
algorithms maintain a pointer in the input arbiter and output
arbiter to resolve the contention. The pointers are updated as
described in Table 1, which is modified from its original
appearance in [4]. A request or grant is selected by choosing
the one appearing next to the current pointer position.
The performance of the 3-phase algorithms for switch size
32 at full offered load is summarized in Table 2. N/A indicates
that the average delay values are not applicable because of the
enormous cell loss at offered load 1. For non-uniform traffic,
local hot spot traffic is used. (The detailed descriptions of
traffic patterns and simulation conditions appear in Section
II.C.) It shows that iSLIP and FIRM perform very well for
TABLE 1: POINTER UPDATE IN 3-PHASE ALGORITHMS
iSLIP [8] FIRM [11] PIM [1]
No grant Unchanged
Random
location Input
arbiter
Grant
One location beyond the
accepted one
Random
location
No request Unchanged
Grant Accepted
One location beyond the
accepted one
Random
location
Output
arbiter
Grant not
accepted
Unchanged The granted one
Random
location
1817
0-7803-7802-4/03/$17.00 © 2003 IEEE