Abstract— This paper assesses VoIP quality over access networks in Pakistan using a delay jitter measurement methodology for evaluating the perceptual quality of voice calls using the ITU-T G.107 speech quality E-model. Passive measurements for voice calls in the presence of background Internet data traffic for G.723.1 and G.729a codecs are carried out using a non-intrusive parametric model. The R-factor and resultant Mean Opinion Scores (MOS) were calculated at different link loads and congestion hot spots were identified. The study highlights the inadequacy of access networks for handling VoIP traffic at current in Pakistan and suggests alleviating congestion by increasing capacity in access networks. Index Terms—VoIP, Perceptual Quality Assessment, E-Model. I. INTRODUCTION he Internet is evolving into the ubiquitous packet switched infrastructure that aspires to provide an Integrated Broadband Network seamlessly integrating voice, video, data and multimedia traffic. Converging telephone and IP networks entails providing the same “toll- quality” service over best-effort IP networks. This is a significant engineering challenge bearing in mind that we now consider the high voice quality standards that are a hallmark of the public switched telephone network (PSTN) for granted. Voice quality is ultimately adjudged by the listener and thus, speech quality is inherently perceptual or subjective in nature. Using a numeric scale ranging from 1 (unacceptable) to 5 (excellent), the Mean Opinion Score (MOS) test provides a widely accepted measure for subjective speech quality [1]. However, assessing speech quality through surveys is a time consuming and expensive process. A viable alternative is to develop quality models that simulate human rating behaviour by correlating perceptual QoS with quantifiable parameters. This is not a straightforward process as objective metrics do not necessarily correlate well with ‘perceptual quality’. A number of quality models and tests that provide objective MOS measures by correlating well with subjective scores have been developed. Rix classifies This work was supported in part by a research grant from the PTCL R&D Fund R&DF/Thematic-01/2004/06. The author’s e-mail addresses are (amir.mehmood@ptcl.net.pk jadoon@lums.edu.pk and adstec@mailcity.com ) these tests as either intrusive or non-intrusive test methods [2]. Intrusive testing methods involve comparing a reference acoustic speech signal with a degraded version of the signal received through the system under test. The ITU-T standardised the Perceptual Speech Quality Measure (PSQM) [3] in 1996. Problems in aligning the reference and degraded signal which are especially accentuated in VoIP networks necessitated improving PSQM and a new model called the Perceptual Evaluation of Speech Quality (PESQ) [4] was standardised by the ITU-T as P.862 in 2001. The E-model is a non-intrusive parametric model that is well-established as a transmission quality model. It is defined in ITU-T G. 107 [5] and is based on the principle that transmission impairments combine additively into a single psycho-acoustic transmission rating (R-factor) on a scale of 0 to 100. The R- factor can further be translated into a MOS through a simple transformation. Sending speech as packets over the Internet entails sampling the original voice signal at a fixed rate and converting each sample in to a fixed number of bits. This constant bit rate stream is then either directly filled in packets of an appropriate size or is processed in frames of 10-30ms duration and compressed before packetization. Packets are subsequently prefixed with RTP/UDP/IP headers. Thus, a sample must wait for an algorithmic, processing and packetization delay before it can be placed on the wire. VoIP packets that traverse the Internet are subject to two principal impairments namely, packet loss and packet delay. Loss may either be due to congestion and may lead to packets being discarded at intermediate nodes or it may result from a failure of network components such as links and/or nodes. Packet delay has a fixed component as a consequence of the propagation and transmission delay as well as a variable component as a result of variable queueing delays packets encountered along buffers at intermediate nodes whilst traversing the Internet. Thus, packets received at the receiver do not have the same temporal relationship as they did at the sender resulting in delay jitter. An appropriately sized fixed or adaptive dejitter or playout buffer compensates for most of this at the expense of an added delay. Delay jitter manifests itself as packet loss for packets that arrive latter than a maximum threshold and degrades the quality perceived by the listener. The conversational quality of a call is primarily affected by the end-to-end delay in addition to the packet loss and delay jitter. These parameters constitute the network QoS Assessment of VoIP Quality over Access Networks M. Amir Mehmood, Pakistan Internet Exchange, IT Infrastructure Division, PTCL, Lahore, Pakistan Tariq M. Jadoon, Lahore University of Management Sciences, Lahore, 54792, Pakistan Noor M. Sheikh, University of Engineering and Technology, Lahore, 54890, Pakistan T 0-7803-9179-9/05/$20.00 ©2005 IEEE.