MNCM a new class of efficient scheduling algorithms for input-buffered switches with no speedup Vahid Tabatabaee Department of Electrical and Computer Engineering and Institute for Systems Research University of Maryland at College Park Email: Vahid@glue.umd.edu Leandros Tassiulas Department of Electrical and Computer Engineering and Institute for Systems Research University of Maryland at College Park Email: Leandros@glue.umd.edu Abstract—In this paper, we use fluid model techniques to establish some new results for the throughput of input-buffered switches. In particular, we introduce a new class of deterministic maximal size matching algorithms that achieves 100% through- put. Dai and Prabhakar [3] has shown that any maximal size matching algorithm with speedup of 2 achieves 100% throughput. We introduce a class of maximal size matching algorithms that we call them maximum node containing matching (MNCM) algorithms, and prove that they have 100% throughput with no speedup. We also introduce a new weighted matching algorithm, maximum first matching (MFM) with complexity O(N 2.5 ) that belongs to MNCM. MFM, to the best of our knowledge, is the lowest complexity deterministic algorithm that delivers 100% throughput. The only assumption on the input traffics is that they satisfy the strong law of large numbers. Besides throughput, average delay is the other key performance metric for the input- buffered schedulers. We use simulation results to compare and study the delay performance of MFM. The simulation results demonstrate promising delay performance for MFM. I. I NTRODUCTION Due to the progress in optical transmission technology and increase in Internet traffic, very fast switching technology are necessary for internet core and edge switches and routers. Among different switch fabric architectures, input-buffered switches are one of the most popular architectures for the high-speed data networks. There are three or four major elements in an input-buffered switch fabric architecture. The first element is the input buffer that is used to buffer the incoming cells or packets. The second element is the switching block which is a cross-bar that connects input ports to the output ports. Note that at any time every input can be connected to only one output port and vice versa. The third element is the scheduler that determines which input port to get connected to which output port and configures the cross-bar accordingly. The fourth element is output buffers. Buffering at the output ports is only required if switch fabric has speedup and works at higher rate than the input and output lines. One of the main reason behind the popularity of the input buffered architecture is that it has the least memory speed requirements. This is specially true when input-buffered switches have no speed-up, because in this case the access rate to cross-bar and memories is equal to the line rate. If we use an input-buffered architecture with speed-up k, then the switch fabric memories and cross-bars should work k times faster than the line rate. In the extreme case, an output-buffered switch is similar to an input-buffered switch with speed-up of N , where N is number of switch ports. The first challenge of input-buffered switches is their throughput performance. It is a well known fact that due to head of line (HOL) blocking, throughput of input buffered switches for i.i.d Bernoulli arrival pattern is limited to 58.6% [7]. Virtual output queueing (VOQ) can eliminate this problem by maintaining a separate queue for each output in every input [1]. In fact, it is shown that by using suitable scheduling (matching) algorithm the input-buffered switches with VOQs can achieve 100% throughput [12], [8], [9], [3]. The main challenge is to develop and design low complexity scheduling algorithms that can achieve 100% or at least reasonably high throughput. Stability and throughput of input-buffered switches is a well studied problem. In papers [12], [8] it is proved that maximum weighted matching (MWM) algorithm can achieve 100% throughput. In [8] number of backlogged packets and maximum delay of waiting packets in each VOQ are consid- ered as two potential weight functions. In another work [9], Mekkittikul and Mckeown considered the case where weights are associated to the ports rather than links and showed that the proposed algorithm, longest port first (LPF), achieves 100% throughput. Complexity of LPF is also O(N 3 ), even though for practical purpose it seems to be more favorable than MWM [10]. Stability of these algorithms are all proven under the assumption of i.i.d. arrivals. Tassiulas [13] has also introduced a class of randomized iterative algorithms that achieve 100% throughput for i.i.d. arrivals. The complexity of the proposed algorithm is O(N 2 ), but it is straightforward to introduce an O(N ) algorithm in this class too. In [5] modifications to the original algorithm are proposed to improve the performance. One of the problems with randomized scheduling algorithms is their poor and non- 0-7803-7753-2/03/$17.00 (C) 2003 IEEE IEEE INFOCOM 2003