Mitigating SIP Overload Using a Control-Theoretic Approach Yang Hong, Changcheng Huang, James Yan Dept. of Systems and Computer Engineering, Carleton University, Ottawa, Canada E-mail: {yanghong, huang}@sce.carleton.ca, jim.yan@sympatico.ca Abstract—Retransmission mechanism helps SIP maintain its reliability, but it can also make an overload worse. Recent server collapses due to emergency-induced call volume in carrier networks indicate that the built-in overload control mechanism cannot handle overload conditions effectively. Since the retransmissions caused by the overload are redundant, we suggest mitigating the overload by controlling redundant message ratio to an acceptable level. Using control-theoretic approach, we model the interaction of an overloaded downstream server with its upstream server as a feedback control system. Then we develop an adaptive PI control algorithm to mitigate the overload at the downstream server by controlling the retransmission message rate of its upstream servers. By performing OPNET simulations on two typical overload scenarios, we demonstrate that: (1) without overload control algorithm applied, the overload at the downstream server may propagate to its upstream servers; (2) our control-theoretic solution not only mitigate the overload effectively, but also achieve a satisfactory target redundant message ratio. Index Terms—SIP, Overload Control, Retransmission Rate Control, Redundant Message Ratio, Control System Stability, Phase Margin 1. INTRODUCTION SIP (Session Initiation Protocol) [1] is becoming the dominant signaling protocol for Internet-based communication services such as Voice-over-IP, instant messaging, and video conferencing. 3GPP (3rd Generation Partnership Project) has adopted SIP as the basis of its IMS (IP Multimedia Subsystem) architecture [2-4]. With the 3G (3 rd Generation) wireless technology being adopted by more and more carriers, most cellular phones and other mobile devices are starting to use or are in the process of supporting SIP for multimedia session establishment [3]. SIP introduces a retransmission mechanism to maintain its reliability [1, 5]. In practice, a SIP sender uses timeout to detect message losses. One or more retransmissions would be triggered if the corresponding reply message is not received in predetermined time intervals. When the message arrival rate exceeds the message processing capacity at a SIP server, overload occurs and the queue increases, which may result in a long queuing delay and trigger unnecessary message retransmissions from its upstream servers. Such redundant retransmissions increase the CPU loads of both the overloaded server and its upstream servers. This may propagate the overload and bring potential network collapse [4, 6-18]. SIP RFC 3261 [1] suggests that the SIP retransmission mechanism should be disabled for hop-by-hop transaction when running SIP over TCP to avoid redundant retransmissions at both SIP and TCP layer [1]. However, nearly all vendors choose to run SIP over UDP instead of TCP for the following reasons [4, 6-21]: (1) The reliability function provided by TCP does not consider real time application which is a critical requirement for SIP protocol; (2) SIP works at application layer while TCP works at transport layer. Even TCP can provide reliability at transport layer, SIP messages can still be dropped or corrupted while being processed at application layer; (3) Designed for preventing congestion caused by bandwidth exhaustion, the complex TCP congestion control mechanism provides little help for SIP overload which is caused by CPU constraint. RFC 5390 [19] identified the various reasons that may cause server overload in a SIP network. These include poor capacity planning, component failures, flash crowds, denial of service attacks, etc. Recent collapses of SIP servers due to “American Idol” flash crowd in real carrier networks have motivated several overload control solutions. For example, both centralized and distributed overload control mechanisms for SIP were developed in [9]. Three window-based feedback algorithms were proposed to adjust the message sending rate of the upstream SIP servers based on the queue length [10]. Retry-after control, processor occupancy control, queue delay control and window based control were proposed to improve goodput and prevent overload collapse in [6]. However, these overload control proposals suggested that the overloaded receiving server advertises to its upstream sending servers to reduce their sending rates. Such pushback control solution would produce overload propagation and block a large amount of calls unnecessarily, thus reducing the revenue of the service providers. Since retransmissions caused by the overload bring extra overhead instead of reliability to the network and exacerbate the overload [16], we suggest mitigating the overload by reducing the retransmission rate only. The contributions of this paper are: (1) Using a control-theoretic approach to model an overloaded downstream server and its upstream server as a feedback control system (as shown in Fig. 4 later on); (2) Proposing a novel PI control algorithm to mitigate the overload and achieve a satisfactory target redundant message ratio by controlling retransmission rate; (3) Performing OPNET simulations under two typical overload scenarios to validate our overload control algorithm. Experimental results will demonstrate that our control-theoretic solution can help the overloaded downstream server to mitigate the overload and prevent the overload from propagating to its upstream servers. 2. SIP RETRANSMISSION MECHANISM OVERVIEW Fig. 1 describes a basic SIP operation among originating UA (User Agent), SIP P-server (Proxy-server) and terminating UA. To set up a call, an originating UA sends an “Invite” request to