Journal of Digital Information Management G Volume 2 Number 2 G June 2004 79 Abstract: Increasingly interactions which entities have with each other are network bound. The entities, with which applications andservices need to interact, span a wide spectrum that includes desktops, PDAs, appliances, and other networked resources. These events encapsulate information pertaining to transactions, data interchange, system conditions and finally the search, discoveryand subsequent sharing of resources. Events have internal or external (system computed) destinations associated with them. In the case of search, discovery and publish/subscribe interactions the system has to efficiently calculate destinations from the corresponding events. This computing of destinations is referred to as matching and is, in itself, a distributed process, which operates on the distributed management of client interests (advertisements and subscriptions). The distributed nature of the underlying messaging infrastructure also mandates an efficient routing engine. Inefficient approaches to either the calculation of, or routing to, destinations can result in unacceptable network degradations. In this paper we explore matching, routing and network utilization issues in the context of NaradaBrokering, which provides support for centralized, distributed and P2P interactions. We also discuss the implications, and include results, pertaining to the different matching engines supported within the NaradaBrokering system. Key words: distributed messaging, publish/subscribe, middleware, matching, event based systems Reviewed and accepted: 31 Mar. 2004 1. Introduction The Internet is currently being used to support increasingly complex interactions. The devices, with which applications and services need to interact, span a wide spectrum that includes desktops, PDAs, appliances, and other networked resources. Clients – which abstract users, resources and proxies thereto – within these systems communicate with each other through the exchange of events, which are essentially messages with timestamps. These events encapsulate information pertaining to transactions, data interchange, system conditions and finally the search, discovery and subsequent sharing of resources. Scaling, availability and fault tolerance requirements entail that the messaging infrastructure hosting these clients, and routing their interactions, be based on a distributed network of cooperating nodes. As the scale of the system increases, effective interactions between clients and services, in these settings, is dictated not just by the processing power of the nodes hosting a specific service but also by the network cycles expended during these interactions. Events have internal or external (system computed) destinations associated with them. In the case of search, discovery and publish/ subscribe interactions, the system has to efficiently calculate destinations from the corresponding events. This computing of destinations is referred to as matching and is, in itself, a distributed process, which operates on the distributed management of client interests (advertisements and subscriptions). Furthermore, the distributed nature of the underlying messaging infrastructure mandates an efficient routing engine that can compute and traverse efficient paths to reach target destinations. We suggest that inefficient approaches to either the calculation of, or routing to, destinations can result in unacceptable network degradations and network flooding. Poor solutions to network utilizations lead to buffer overflows, queuing delays, network clogging and other related problems that add up considerably over a period of time. Although multicasting and bandwidth reservation protocols such as RSVP [1] and ST-II [2] can help in better utilizing the network, they require support at the router level. There needs to be a conceited effort to ensure the efficient utilization of networks and networked communal resources. More importantly, the underlying solution should incorporate sophisticated matching engines needed to provide support for increasingly complex and sophisticated qualifiers, for specifying constraints, that events should satisfy prior to being considered for delivery to applications. In this paper we explore matching, routing and network utilization issues in the context of our research system NaradaBrokering [3-12], which provides support for centralized, distributed and peer-to-peer (P2P) interactions [13]. NaradaBrokering has been tested in synchronous and asynchronous applications, including as a media server for audio-video conferencing. Depending on the type of interactions routed and the corresponding matching engines supported, the underlying messaging infrastructure/middleware could be viewed either as a distributed light-weight relational or XML database. We also discuss the implications, and include results, pertaining to the different matching engines supported within the NaradaBrokering system. This paper is organized as follows. In Section 2 we discuss related work in the area. In section 3 we identify the core issues relevant to supporting efficient interactions within the system; sections 4, 5, 6 and 7 elaborate on these core issues of broker topology, organization of client interests/constraints (profiles), routing of events and support for multiple matching engines respectively. Finally in Section 8 we outline conclusions and future work. 2. Related Work Different systems address the problem of event delivery to relevant clients in different ways. In Elvin [14] network traffic reduction is accomplished through the use of quench expressions, which prevent clients from sending notifications for which there are no consumers. This, however, entails each producer to be aware of all the consumers and their subscriptions. Ref [15] outlines a strategy to convert each subscription in Elvin into a deterministic finite state automaton. This conversion, and the matching solutions, nevertheless can lead to an explosion in the number of states. In Sienna [16] optimization strategies include assembling patterns of notifications The user browses the ontology choosing one of the input point (left panel of the frame) representing the taxonomies of the ontology and navigates visiting the sub tree topics until reaching a concept of interest.The concept of interest is shown in the middle of the right panel of the frame and related concepts are displayed around it. The ontology may be browsed by promoting any of the related concepts to be the central concept. The new central concept is then linked to all its related concepts as close as possible to the publishers, while multicasting notifications as close as possible to the subscribers. In Gryphon [17] each broker maintains a list of all subscriptions within the system in a parallel search tree (PST). The PST is annotated with a trit vector encoding link routing information. These annotations are then used at matching time by a server to determine which of its neighbors should receive that event. Approaches for exploiting group based multicast for event delivery is discussed in Ref [18]. The Event Service [19] approach adopted by the OMG is one of establishing channels and subsequently registering suppliers and consumers to the event channels. The approach could entail clients (consumers) to be aware of a large number of event channels. The Notification Service [20] addresses limitations pertaining to the lack of event filtering capability. However it attempts to preserve all the semantics specified in Event Service while allowing for interoperability between clients from the Efficient Matching of Events in Distributed Middleware Systems Shrideep Pallickara and Geoffrey Fox Community Grids Laboratory Indiana University, IN. USA 47401 {spallick,gcf}@indiana.edu