Abstract— This paper assesses VoIP quality over access
networks in Pakistan using a delay jitter measurement
methodology for evaluating the perceptual quality of voice calls
using the ITU-T G.107 speech quality E-model. Passive
measurements for voice calls in the presence of background
Internet data traffic for G.723.1 and G.729a codecs are carried
out using a non-intrusive parametric model. The R-factor and
resultant Mean Opinion Scores (MOS) were calculated at
different link loads and congestion hot spots were identified. The
study highlights the inadequacy of access networks for handling
VoIP traffic at current in Pakistan and suggests alleviating
congestion by increasing capacity in access networks.
Index Terms—VoIP, Perceptual Quality Assessment,
E-Model.
I. INTRODUCTION
he Internet is evolving into the ubiquitous packet
switched infrastructure that aspires to provide an
Integrated Broadband Network seamlessly integrating
voice, video, data and multimedia traffic. Converging
telephone and IP networks entails providing the same “toll-
quality” service over best-effort IP networks. This is a
significant engineering challenge bearing in mind that we
now consider the high voice quality standards that are a
hallmark of the public switched telephone network (PSTN)
for granted.
Voice quality is ultimately adjudged by the listener and
thus, speech quality is inherently perceptual or subjective in
nature. Using a numeric scale ranging from 1 (unacceptable)
to 5 (excellent), the Mean Opinion Score (MOS) test
provides a widely accepted measure for subjective speech
quality [1]. However, assessing speech quality through
surveys is a time consuming and expensive process. A viable
alternative is to develop quality models that simulate human
rating behaviour by correlating perceptual QoS with
quantifiable parameters. This is not a straightforward process
as objective metrics do not necessarily correlate well with
‘perceptual quality’. A number of quality models and tests
that provide objective MOS measures by correlating well
with subjective scores have been developed. Rix classifies
This work was supported in part by a research grant from the PTCL R&D
Fund R&DF/Thematic-01/2004/06.
The author’s e-mail addresses are (amir.mehmood@ptcl.net.pk
jadoon@lums.edu.pk and adstec@mailcity.com )
these tests as either intrusive or non-intrusive test methods
[2]. Intrusive testing methods involve comparing a reference
acoustic speech signal with a degraded version of the signal
received through the system under test. The ITU-T
standardised the Perceptual Speech Quality Measure (PSQM)
[3] in 1996. Problems in aligning the reference and degraded
signal which are especially accentuated in VoIP networks
necessitated improving PSQM and a new model called the
Perceptual Evaluation of Speech Quality (PESQ) [4] was
standardised by the ITU-T as P.862 in 2001. The E-model is
a non-intrusive parametric model that is well-established as a
transmission quality model. It is defined in ITU-T G. 107 [5]
and is based on the principle that transmission impairments
combine additively into a single psycho-acoustic
transmission rating (R-factor) on a scale of 0 to 100. The R-
factor can further be translated into a MOS through a simple
transformation.
Sending speech as packets over the Internet entails
sampling the original voice signal at a fixed rate and
converting each sample in to a fixed number of bits. This
constant bit rate stream is then either directly filled in packets
of an appropriate size or is processed in frames of 10-30ms
duration and compressed before packetization. Packets are
subsequently prefixed with RTP/UDP/IP headers. Thus, a
sample must wait for an algorithmic, processing and
packetization delay before it can be placed on the wire. VoIP
packets that traverse the Internet are subject to two principal
impairments namely, packet loss and packet delay. Loss may
either be due to congestion and may lead to packets being
discarded at intermediate nodes or it may result from a failure
of network components such as links and/or nodes. Packet
delay has a fixed component as a consequence of the
propagation and transmission delay as well as a variable
component as a result of variable queueing delays packets
encountered along buffers at intermediate nodes whilst
traversing the Internet. Thus, packets received at the receiver
do not have the same temporal relationship as they did at the
sender resulting in delay jitter. An appropriately sized fixed
or adaptive dejitter or playout buffer compensates for most of
this at the expense of an added delay. Delay jitter manifests
itself as packet loss for packets that arrive latter than a
maximum threshold and degrades the quality perceived by
the listener. The conversational quality of a call is primarily
affected by the end-to-end delay in addition to the packet loss
and delay jitter. These parameters constitute the network QoS
Assessment of VoIP Quality over Access Networks
M. Amir Mehmood, Pakistan Internet Exchange, IT Infrastructure Division, PTCL, Lahore, Pakistan
Tariq M. Jadoon, Lahore University of Management Sciences, Lahore, 54792, Pakistan
Noor M. Sheikh, University of Engineering and Technology, Lahore, 54890, Pakistan
T
0-7803-9179-9/05/$20.00 ©2005 IEEE.