Policy-based Management of Semantic Clustering Dominic Jones, John Keeney, David Lewis, Declan O’Sullivan Knowledge & Data Engineering Group (KDEG) – Trinity College Dublin, Ireland {Jonesdh | John.Keeney | Dave.Lewis | Declan.OSullivan}@cs.tcd.ie 1. INTRODUCTION We introduce the concept of a Knowledge-based Network KBN [1] as an extended Content-based Network (CBN), where semantically rich messages co-exist with the more traditional CBN messages format. One of the main advantages this new technology, a KBN, is in its ability to define subscription filters across the semantics of the contents of messages, in addition to the commonly available, traditional, CBN filters. Knowledge-based Networks (KBN) support the routing of semantically enriched messages between interested parties across a common network of message brokers. Knowledge-based Networks introduce an unprecedented level of semantic richness, which allows semantic clusters of publisher/subscriber to form within a KBN creating pockets of focused interest. When this is exploited direct performance increases can be seen [1]. We propose a flexible policy-driven mechanism to manage the future semantic clustering of Knowledge-based Networks. 2. KBN CLUSTERING Within Knowledge-based Networks publishers and subscribers direct their publications or subscriptions towards single, or multiple brokers. Clusters of publishers, subscribers and brokers are formed around groups of users interested in the same content. Baldoni et al [2] present an “architecture based on clustering peers subscribed to the same topic” as well as the work of Anceaume et al [3] in which “subscribers self-organize according to similarity relationships among their subscriptions”. These works support the argument that a generalized clustering technique is of benefit within a Distributed Event-based System. It has been shown that the overall network and broker performance of a Knowledge-based Network can be increased through semantic clustering [1]. The provision of semantically rich publications and subscriptions allows for the introduction of an even stronger level of semantic clustering than shown in current work. For example, basing the network around an ontology in which academic conferences are represented enables the clustering of academics interested in research areas, conferences, locations, dates and the relationships between these concepts , as opposed to static references to possible publications. When a user’s interest change, the ontology changes, and thus the clusters which are formed around the users and their ontologies should also change. This allows for the natural representation of a drift in interests to represented semantically and must be represented in the Clusters themselves, seen as an important change in the underlying structure of the KBN. 3. MOTIVATION In a clustered Knowledge-based Network some brokers deal with a focused range of semantics within the set of messages they can process. This both categorizes the messages passing across the broker/network and moves towards a loose guarantee that a message arriving at a broker is of interest to that broker (or group of brokers) and the subscribers / publishers connected to that broker. Through the clustering proposed within this paper we see three main benefits beyond a normal semantically enhanced publish/subscribe system, these benefits allow KBNs to: 1) Reduce and optimize the routing and subscription information held each node through the reduction of the possible set of interests applicable to that node; 2) Reduce the number of hops between related producers and consumers by semantically clustering both around relevant brokers within the network; 3) Increase the number of subscription-aggregation occurrences, which decreases and optimizes the number of subscriptions held in any particular subscription table. A smaller and more ordered subscription table decreases the time taken to match a publication to a possible set of subscriptions. Additionally given that each broker within the network holds an ontological representation of the knowledge-base on which it may be required to reason, and through the introduction of clustering, we can reduce the size of this knowledge-base and the subsequent reasoning overhead at each broker. This methodology is still dependent upon an upper- level broker network which knows of all topics, at a high level, such that an unknown message can be passed up the router chain until a router knows of the knowledge presented in the message and can correctly forward the message, at least towards the correct cluster. Semantic Clustering demands the natural grouping of clients across a network of brokers which pushes towards there being a refinement and movement away from the publish/subscribe “anywhere” methodology. Although a message inserted anywhere into the network must still be routed towards any interested subscriber we must try to group related clients and brokers. This aligns with the vision of using an overlay network, where clients (and sub-brokers) may not necessarily connect to a geographically close broker, but rather offset the decreased network performance with more optimized, semantic, application-level concerns. Outlining that scalability and accessibility need to be central in a semantically enriched operational environment it becomes important that decisions are made which are both relevant and representative of the networks operation as prescribed by service- level-agreements and managerial decisions made using as high a level of governance as possible. 4. MANAGEABILITY Having established the driving force behind clustering, the semantics associated with those clusters, and the performance gains shown through the coupling of the two [1], it is possible to address the future direction of this research. By visualizing a