D -Causality and 1 -delivery for wide-area group communications T. Tachikawa * , H. Higaki, M. Takizawa Deptartment of Computers and Systems Engineering, Tokyo Denki University, Ishizaka Hatoyama, Hiki-gun Saitama, Tokyo, 350-0394 Japan, Received 31 October 1997; received in revised form 31 March 1999; accepted 31 March 1999 Abstract In distributed applications, a group of multiple processes cooperate by exchanging messages. It is critical to support the group of application processes with enough quality of service (QoS) including the ordered delivery of messages. The delay time and the message loss ratio are signiﬁcant QoS parameters. In Internet applications, the delay time and the loss ratio are signiﬁcantly different in different communication channels. We deﬁne a novel causality named D  -causality among the messages to hold in the world-wide environment. We discuss how to transmit messages to the destination processes and how to resolve message loss and delay supporting the D  -causality, given the requirements of delay time and message loss ratio.  2000 Elsevier Science B.V. All rights reserved. Keywords: Group communication protocol; Causally ordering; D -Causality; 1 -Delivery; Wide-area group 1. Introduction In distributed applications like teleconferences, a group of multiple processes cooperate by exchanging multimedia data. Group communication protocols support a group of processes with the reliable and ordered delivery of messages to multiple destinations in the group. Transis and others support the causally ordered delivery. ISIS(ABCAST), Amoeba, Trans/Total, Rampart, and others support the totally ordered delivery. Group communication protocols discussed so far assume that every communication channel has almost the same communication delay time and mostly assume that the communication network is reliable and often synchronous, i.e. no message loss and bounded delay time. The FACE project is now developing the world-wide teleconferences among the agents distributed in Japan, USA and UK. Here, let us consider a world-wide teleconference among processes K, U, S and H in Keele of UK, UCLA of the USA, Sendai and Hatoyama of Japan, respectively. By using the Internet, it takes about 60 ms to propagate a message in Japan, while between Japan and Europe it takes about 240 ms. In addition, the longer the distance, more the messages lost. For example, more than 10% of the messages are lost between Japan and Europe while less than 1% is lost in Japan. Thus, each communication channel between the processes supports different delay time and a different level of reliability in the wide-area group. If the traditional group communication protocols are adopted to the wide-area group, the time for delivering messages to the destinations is dominated by the channel with the longest delay and the lowest level of reliability. It is signiﬁcant to overcome these difﬁculties in the Internet. In realtime multimedia applications, messages have to be delivered in some predetermined time units. The D -causal- ity among messages is discussed where D denotes the maxi- mum delay time between the processes required by the application. That is, it is meaningless to receive a message m unless m is delivered in D after m is transmitted. The D - causality assumes that every pair of processes have the same delay time D . Each communication channel between a pair of processes P i and P j supports a different quality of service (QoS), i.e. delay time d ij and message loss ratio 1 ij . d ij and 1 ij are furthermore time-variant. For example, d ij and 1 ij are increased if the communication channel between the processes P i and P j is congested. In contrast, the application requires the system to support some QoS. Here, let D ij and E ij be the delay time and the message loss ratio required for a pair of processes P i and P j , respectively. Here, the problem is how to support P i and P j with D ij and E ij given d ij and 1 ij in the group. If messages are lost in the network and E ij  1 ij , some of the lost messages have to be retransmitted. This means, the less reliable the communication channel is, the longer it takes to deliver messages to the destinations. Thus, the delay d ij is related with the message loss ratio 1 ij . For Computer Communications 23 (2000) 13–21 0140-3664/00/$ - see front matter  2000 Elsevier Science B.V. All rights reserved. PII: S0140-3664(99)00091-2 www.elsevier.com/locate/comcom * Corresponding author. Tel.: + 81-492-962911; fax: + 81-492- 966185. E-mail addresses: tachi@takilab.k.dendai.ac.jp (T. Tachikawa), hig@ takilab.k.dendai.ac.jp (H. Higaki), taki@takilab.k.dendai.ac.jp (M. Takizawa)