IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 CCD: A Distributed Publish/Subscribe Framework for Rich Content Formats Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian Abstract—In this paper, we propose a content-based publish/subscribe (pub/sub) framework that delivers matching content to subscribers in their desired format. Such a framework enables the pub/sub system to accommodate richer content formats including multimedia publications with image and video content. In our proposed framework, users (consumers) in addition to specifying their information needs (subscription queries), also specify their profile which includes the information about their receiving context which includes characteristics of the device used to receive the content (e.g., resolution of a PDA used by a consumer). The pub/sub system besides being responsible for matching and routing the published content, also becomes responsible for converting the content into the suitable format for each user. Content conversion is achieved through a set of content adaptation operators (e.g., image transcoder, document translator, etc.). We study algorithms for placement of such operators in heterogeneous pub/sub broker overlay in order to minimize the communication and computation resource consumption. Our experimental results show that careful placement of operators in pub/sub overlay network results in significant cost reduction. Index Terms—Publish/Subscribe, Operator placement, Customized content dissemination APPENDIX A NP- HARDNESS Theorem 1: CCD problem is NP-hard. Proof: We show that the CCD problem is NP-hard when there is only one broker in the system. Clearly, if the problem is NP-hard for one broker, it remains NP-hard for n brokers too. To prove the theorem it is enough to show that the NP-hard problem of computing the “Minimum directed Steiner Tree” can be reduced to an instance of the CCD problem. The minimum directed Steiner tree problem is the following: Given a directed graph G =(V,E) with edge-weights, a set of terminals (vertices) S V , and a root vertex r, find a minimum weight tree rooted at r, such that all vertices in S are included in the tree [11]. It is easy to see that any instance of the directed Steiner tree problem is equivalent to the degenerate CCD problem where G = CAG, the vertices’s in S correspond to the set of formats in which content is required, and r is the original format of content. Since the CCD problem is NP-hard for the case of one broker, it remains NP-hard in the general case as well. APPENDIX B MULTILAYER GRAPH REPRESENTATION OF CCD An interesting observation is that CCD problem can be formulated as a minimum directed Steiner tree problem for a multilayer graph constructed from the given CAG H. Jafarpour is with NEC Labs America, B. Hore, S. Mehrotra and N. Venkatasubramanian are with the Department of Computer Science, University of California, Irvine, CA, 92697. E-mail: hojjat@sv.nec-labs.com, {bhore,sharad,nalini}@ics.uci.edu and dissemination tree. In fact this observation was made in [12] for multicasting problem. A multilayer graph for CCD problem is constructed by combining the dissemination tree and the content adaptation graph (CAG) as follows: Generate m replicas of the dissemination tree, each representing a layer corresponding to a format in the CAG (m is the number of formats in the CAG). The restriction being that within each layer, data can be transmitted along the edges in the format corresponding to that layer only. We denote the multilayer graph by G ML =(V , E ) such that V = V d ×V c where V c denotes the set of vertices in the CAG and V d denotes the set of nodes in the dissemination tree. Each vertex in V is therefore associated with exactly one pair of nodes, where the first member is a node in the dissemination tree and the other corresponds to a format in the CAG. For a vertex v in a multilayer graph the corresponding format in the CAG is referred by v.format and the corresponding node in the dissemination tree by v.node. The edge set of G ML comprises the following two kinds of edges – edges that connect two nodes in the same layer (called transmission edges) and edges that connect nodes across layers (called conversion edges. There is a directed transmission edge in every layer corresponding to a link in the original dissemination tree. Similarly, there is a directed conver- sion edge joining the vertices corresponding to the same (physical) node between layers L i and L j if and only if there is an edge from format F i to F j in the CAG. The weight of a transmission edge in layer L i is equal to the transmission cost of its corresponding format, i.e., F i . Similarly, the weight of a conversion edge between two layers L i and L j is the same as the conversion cost from format F i to F j in the CAG. We will assume that the transmission cost and conversion cost are measured in