QUALITY OF SERVICE AND REAL TIME MPI 1 Quality of Service and Real Time MPI Athanasios Margaris, Stavros Souravlas, and Manos Roumeliotis University of Macedonia, Technology Management Department Naousa, Macedonia, Greece Abstract—Quality of service in computer networks is usually associated with streamed data like voice or picture. However, it is also required by other applications that depend on timely delivered messages. This paper presents the main aspects of quality of service and the way it is offered in a parallel application that uses MPI functions. The focus is given to the presentation of the main principles and features of QoS as well as to the description of the data structures of the MPI/RT (Real Time MPI) design to provide such facilities. Index Terms—parallel programming, quality of service, MPI/RT, message passing interfacer I. QUALITY OF SERVICE Quality of service (QoS) is a fundamental issue associated with the process communication in a networked environment and it is deﬁned as the provision of data communication services in a consistent and reliable way such that the system is capable to meet the user’s timing requirements. This objective can be achieved via the collaboration of all the components of the communication system such the routers and the computing nodes. The need for QoS is required since the TCP protocol can not guarantee that the network packets arrive to their destinations in time, since it only provides simple routing services for the transmission of the data from the source to the destination. Even though this approach does not affect the performance of the traditional and well known applications such as FTP and WEB services, real time applications such as sound and video streaming impose additional restrictions since they require a large bandwidth as well as a very small latency. More speciﬁcally, the most important parameters associated with QoS are the following[1]: • Latency time is deﬁned as the time interval for message exchange between sender and receiver, namely the dura- tion of the packet transmission through the communica- tion channel. This duration includes latency due to store- and-forwarding of the data packets by the intermediate routers. • Jitter is associated with the desynchronization effect that appears when the data packets are received by the destination host in an order other than the one used during their transmission. • Channel bandwidth is deﬁned as a measure of the channel capacity, that is the maximum theoretical capacity value of a connection. The larger the channel capacity, the larger the amount of data transmitted through the channel. • Packet loss is deﬁned as the percentage of the packets that are lost during transmission. The QoS feature in IP networks is implemented by using two different methods. In the ﬁrst method (IS, Integrated Ser- vices) that uses the so-called Resource Reservation Protocol (RSVP), each router can recognize the different packet types and treat them in different ways so as to meet the user needs. In the second method (differentiated services, DS), the data packets are sorted with respect to some ﬁelds of their header such as the source and the destination ports and IP addresses and then, the required QoS is supplied to each one of them. Another mechanism associated with the QoS is the policy that ensures that a network station is not allowed to transmit data in arbitrarily large data rates, and the trafﬁc shaping feature dealing with the problem of bursting in order to prevent packet rejection by the policing mechanism. Finally, one should mention the admission control technique that determines if a request for QoS support has to be accepted by the system as well as the priority and the scheduling mechanisms. If the system supports the priority feature, then each packet can be assigned a priority value according to its associated service type, while the scheduling mechanisms - such as the GPS (Generalized Process Sharing), the WRR (Weighted Round Robin), WFQ (Weighted Fair Queuing) and CBQ (Class Based Queuing) - ensures that the utilization of the CPU from the system process is going to be done in a fairy way. In the case of parallel real time applications, the appli- cation performance is strongly dependent of the distribution of the system resources, a fact that is especially true for the networked applications where the process contention can lead to great performance degradation in TCP/IP networks. This situation may be far worse in message passing interface applications, due to their increased complexity, since: (a) the data communication is usually characterized by packet bursting due to the fact that in these applications there are computation stages followed by communication stages and visa versa, (b) the communication of the computing nodes is based on TCP/IP and therefore, a high level communication request may give rise to a large number of low level requests, and (c) in most cases the operation of the parallel application is not based on point to point but on collective operations, thus increasing the trafﬁc load. II. MPICH-G2 AND MPI/RT There are two MPI implementations that are capable of providing QoS support to parallel applications, identiﬁed by the names MPICH-G2[2] and MPI/RT. The MPICH-G2 im- plementation allows the use of the MPI protocol in wide area networks that support the GARA (General purpose Architec- ture for Reservation and Allocation) based QoS support. On the other hand, the MPI/RT (Real Time MPI) implementation allows the development of real time applications that meet the QoS requested by the user.