Performance of Exhaustive Matching Algorithms for Input-Queued Switches Yoohwan Kim Electrical Engineering and Computer Science Department Case Western Reserve University Cleveland, OH 44106 H. Jonathan Chao Electrical and Computer Engineering Department Polytechnic University Brooklyn, NY 11201 Abstract-- Virtual output queue (VOQ) architecture is commonly used for avoiding head-of-line blocking in input- queued switches. Many algorithms have been developed for transferring the cells from the VOQs to the output ports. Traditional iterative algorithms such as iSLIP and DRRM, achieve 100% throughput under uniform traffic. But under non- uniform traffic, throughput drops significantly. Recently, a new paradigm of exhaustive matching (EM) has been introduced for handling non-uniform traffic while preserving the complexity of traditional iterative algorithms. In EM, a VOQ is served continuously until it becomes empty. Only the input ports that have finished serving a VOQ look for a new match. This strategy produces very good throughput and delay performance in uniform and non-uniform traffic. However under some traffic patterns, there is a starvation problem when a VOQ occupies an output port for an extended period of time. This problem can be eliminated by providing a priority service for a VOQ that has waited an excessively long time. The resulting algorithm, prioritized EM (PEM), eliminates starvation and achieves very high throughput for many traffic patterns. Keywords—Switch, Scheduling, Matching, Virtual Output Queueing, Exhaustive service I. INTRODUCTION Input-queued switches are preferred for high-speed packet switching because the memory speed remains comparable to the input line rate[10]. However, the throughput of input- queued switches is limited to 58.6% due to head-of-line (HOL) blocking[5]. A common solution is to use virtual output queue (VOQ) architecture[1], where each input maintains separate queues for each output as shown in Fig. 1. The incoming cells are queued up in VOQs at input ports. Cells are transferred to output ports based on a specific scheduling algorithm. Many proposals were made in scheduling the cell transfer in a VOQ switch. Those scheduling algorithms are generally grouped into maximum weight matching (MWM) and maximum size matching (MSM)[6]. Although maximum weight matching guarantees 100% throughput for all traffic patterns[3][9], it is considered impractical due to high complexity (O(N 3 logN)). Maximum size matching has lower but still high complexity (O(N 5/2 )), so its approximation algorithms are used in practice. Those approximation algorithms achieve maximal match by multiple iterations. From an implementation point of view, a matching algorithm can be grouped as 3-phase or 2-phase[4]. A 3–phase algorithm follows these basic steps. N x N Non-blocking Switch Outputs Inputs 1 2 1 2 N N . . . . . . . . VOQ (1, 1) VOQ (1, 2) . . . VOQ (1, N) VOQ (1, 1) VOQ (1, 2) . . . VOQ (1, N) Fig. 1: VOQ architecture in input-queued switch 1. Request: Each input sends a request to every output for which it has a queued cell. 2. Grant: Output chooses one input and sends a grant. 3. Accept: Input chooses one grant. Different Algorithms are used depending on how contentions are resolved at step 2 for multiple requests and at step 3 for multiple grants. All round-robin matching based algorithms maintain a pointer in the input arbiter and output arbiter to resolve the contention. The pointers are updated as described in Table 1, which is modified from its original appearance in [4]. A request or grant is selected by choosing the one appearing next to the current pointer position. The performance of the 3-phase algorithms for switch size 32 at full offered load is summarized in Table 2. N/A indicates that the average delay values are not applicable because of the enormous cell loss at offered load 1. For non-uniform traffic, local hot spot traffic is used. (The detailed descriptions of traffic patterns and simulation conditions appear in Section II.C.) It shows that iSLIP and FIRM perform very well for TABLE 1: POINTER UPDATE IN 3-PHASE ALGORITHMS iSLIP [8] FIRM [11] PIM [1] No grant Unchanged Random location Input arbiter Grant One location beyond the accepted one Random location No request Unchanged Grant Accepted One location beyond the accepted one Random location Output arbiter Grant not accepted Unchanged The granted one Random location 1817 0-7803-7802-4/03/$17.00 © 2003 IEEE