Dominating Set Theory based Semantic Overlay Networks for Efficient and Resilient Content Distribution J. Amutharaj Arulmigu Kalasalingam College of Engineering/Department of CSE, Srivilliputhur, India. Email: amutharajj@yahoo.com S. Radhakrishnan Arulmigu Kalasalingam College of Engineering/Department of CSE, Srivilliputhur, India. Email: srk@akce.ac.in Abstract— Recently overlay networks have emerged as an efficient and flexible method for content distribution. An overlay network is a network running on top of another network, usually the Internet. This network is formed by a subset of underlying physical nodes. The connections between the overlay nodes are provided by overlay links, each of which is usually composed of one or more physical link. These networks are employed in many settings to provide logical communication infrastructure over an existing communication networks. The main objective of the overlay network is to reduce routing path lengths, stretched by the overlay routing process. In the existing solutions developed, a kind of fixed infrastructure in the form of excessive message exchange is necessary to guarantee good overlay properties. The scope of our effort is to construct an overlay network based on Dominating Set Theory to optimize the number of nodes for large data transfer. Fast Replica algorithm is applied to reduce the content transfer time for replicating the content within the semantic network. A dynamic parallel access scheme is introduced to download a file from different peers in parallel from the Semantic Overlay Network (SON), where the end users can access the members of the SON at the same time, fetching different portions of that file from different peers and reassembling them locally. That is, the load is dynamically shared among all the peers. To eliminate the need for retransmission requests from the end users, an enhanced digital fountain with Tornado codes is applied. Decoding algorithm at the receiver will reconstruct the original content. In this no feedback mechanisms are needed to ensure reliable delivery. This paper analyzes the performance of sequential unicast, multiple unicast and fast replica with tornado content distribution strategies in terms of content replication time and delivery ratio. This paper also analyzes the impact of dominating set theory for the construction of semantic overlay networks. Index Terms—Semantic Overlay Networks, Dominating Set Theory, Multicasting, Fast Replica, Content Distribution, Replication Time, Delivery Ratio. I. INTRODUCTION Peer-to-peer networks are emerging as a significant vehicle for providing distributed services (e.g. search, content integration and administration) both on the Internet [1] and in enterprises. Content Delivery Networks (CDN’s) based on a large-scale distributed network of sites located closer to the edges of the Internet are used for efficient delivery of digital content including software packages and multimedia content. Locating content in decentralized peer-to-peer system is a challenging problem. Ensuring the availability of content on the Internet is expensive and only few options are available. They use premium content hosting services, build and manage their own content distribution infrastructures, or contract with Content Delivery Networks [2]. The main goal of the CDN’s architecture is to minimize the network impact in the critical path of content delivery as well as to overcome the overload problem that is a serious threat for busy sites serving popular contents. For typical web documents served via CDN, there is no need for active replication of the original content at the edge servers. For large documents, software packages and media files, it is desirable to replicate these files at edge servers in advance. For large files it is a challenging, resource-intensive problem [3], e.g. Media files can require significant bandwidth and download time due to their large sizes. In order to offload popular servers and improve end- user experience, copies of popular content are often stored in different locations. With mirror site replication, documents from a primary site are proactively replicated at secondary sites. When a copy of the same document exists at multiple servers, choosing the server that provides the best response time is not trivial and the resulting performance can dramatically vary depending on the server selected [4,5]. Instead of downloading the entire document from one server, a user downloads different parts of the same 42 JOURNAL OF NETWORKS, VOL. 3, NO. 3, MARCH 2008 © 2008 ACADEMY PUBLISHER