An Architecture for Centralized SIP-based Audio
Conferencing using Application Layer Multicast
José Simões
1
, Ravic Costa
1
, Paulo Nunes
1, 3
, Rui Lopes
1, 2
, Laurent Mathy
4
1
Departamento de Ciências e Tecnologias da Informação - Instituto Superior das Ciências do Trabalho e da Empresa (ISCTE)
2
Associação para o Desenvolvimento das Técnicas e Tecnologias de Informação (ADETTI)
3
Instituto de Telecomunicações (IT)
Lisboa, Portugal
{Jose.Simoes, Ravic.Costa, Paulo.Nunes, Rui.Lopes}@iscte.pt
4
Computing Department - Lancaster University
Lancaster, United Kingdom
laurent@comp.lancs.ac.uk
Abstract — Audio conferencing is an important aspect of Internet
Telephony services. In this article, using a centralized
conferencing architecture, we propose to employ application
layer multicast for media distribution, by using “agents”
responsible for the delivery of streaming media to end-clients,
aiming at reducing the traffic in the network and the server
workload. In this model, we use the concepts of active client,
which exchanges media directly with the server, and passive
client, which is limited to receive media from its agent. In our
implementation we use the Real Time Protocol (RTP) for data
transmission and the Session Initiation Protocol (SIP) for
signaling, hence making it compatible with most existing state-of-
the-art hardware and software.
Keywords — Audio Conferencing; SIP; ALM;VoIP
I. INTRODUCTION
With Internet Telephony services dissemination, there is an
urge to introduce several traditional telephony functionalities to
this new environment. Conferencing is not an exception. An
audio conference call is a telephone call (IP or not) where the
calling party wants to have more than one called party listen in
to the audio. Conferencing is widely used these days, and it is
used in many different applications and scenarios. It can be
used for entertainment, social, education or business purposes.
In our proposal, we pretend to serve large scale applications,
with a large number of participants, where a few of them are
producing media (active participants), while the majority
(passive participants) are just listening to what is produced.
Concerning conferencing models, they can be distinguished
between: centralized; full mesh; unicast receive and multicast
send; multicast, and end mixing [1]. This classification is based
on the topology of signaling, media delivery and architecture
component relationships [1]. In this paper we consider the
centralized conference scenario where a server receives media
streams from all participants, mixes them if needed, and
redistributes the appropriate media stream(s) back to the
participants. This model has the advantage that clients do not
need to be modified to perform media mixing and transcoding.
In addition, it is relatively easy to support heterogeneous media
clients [1]. Since it is difficult for each sender to subtract its
own contribution, the server needs to create a customized
stream for each of the active callers, e.g., [2]. Assuming that
not all users are using the same media format, the server needs
to decode the audio streams to a non-compressed audio format
to mix them. After that, it encodes the mixed stream in the
appropriate media format for each of the participants. This will
lead to media distribution and server workload scalability
problems limiting the number of participants in a conference
call.
To improve some of the limitations of the centralized
conferencing model, notably, the amount of traffic in the
network and the server workload, we decided to study the
impact of multicast for media distribution.
Due to the non-existence of a globally deployed inter
domain multicast routing protocol at the network layer; the use
of Application Layer Multicast (ALM) is proposed in our
architecture for media distribution [3]. Another important
requirement for our architecture is that is should be SIP-based
[4]. This will allow the use of popular software for both
terminals and server (such as X-Lite, Kphone, SER, etc.). For
media distribution, RTP is used either in unicast or ALM
connections, depending whether clients are instantaneously
active or passive.
II. RELATED WORK
Many conference servers in the market today are H.323-
based. However, SIP-based conferencing systems are gaining
more and more enthusiasts [1]. As an example, in [1], a
centralized SIP-based conferencing system provides a suitable
multimedia conferencing platform that allows advanced
scenarios and services (e.g., transcoding) without requiring that
end systems are conferencing aware.
Another important aspect that has to be considered, in what
concerns audio conferencing, is how the media is delivered. In
ISBN: 1-9025-6013-9 © 2006 PGNet