New York Science Journal, 2012; 5(2); http://www.sciencepub.net/newyork 20 DYNAMIC SATELLITE BASED DISTRIBUTED WEB CACHING Namit Gupta and Rajeev Kumar Computer Sc. & Engg. Department, Teerthanker Mahaveer University, Moradabad, Uttar Pradesh, India namit.k.gupta.coe@tmu.ac.in , rajeev2009mca@gmail.com Abstract: The World Wide Web is growing exponentially and already accounts for a big percentage of the traffic in the Internet. The Distributed Web Caching System suffers from scalability and less robustness problem due to overloaded and congested proxy servers. Load Balancing and Clustering of proxy servers helps in fast retrieval of pages, but cannot ensure robustness of system. In this paper we have given solution for scalability and robustness of Distributed web caching System and for load balancing Clustering and metadata manageability. We have also refined our technique using extensively analyze the log entries of the Eurecom and other Squid caches [8] in order to show what hit rates might be achieved with dynamic allocation of requests. We devised an algorithm for Distributed Web Cache concepts with satellite based clusters of proxy server based on geographical regions. It increases the scalability by maintaining metadata of neighbors. Based on which hit ration will be high. It increases the scalability by maintaining metadata of neighbors collectively and balances load of proxy servers dynamically to other less congested proxy servers, so system doesn’t get down unless all proxy servers are fully loaded so higher robustness of system is achieved. This algorithm also guarantees data consistency between the original server object and the proxy cache objects using semaphore. [Namit Gupta and Rajeev Kumar. Dynamic Satellite based Distributed web caching. New York Science Journal 2012;5(2):20-26]. (ISSN: 1554-0200). http://www.sciencepub.net/newyork . 4 Keywords: Distributed Web caching; satellite based Clustering; Latency; Hit Ratio; Metadata; Robustness. 1. INTRODUCTION As the World Wide Web (WWW) is gaining more and more popularity, servers have to handle more requests accordingly. The more people (or simply clients) request resources (in this case files) from web servers, the faster servers have to accept and process the requests. To cope with these requirements programmers as well as system administrators must take countermeasures. From the very beginning of the WWW the requirements for servers have not only changed from the view of traffic, but also from the type of content they deliver to the client. Initially static pages had to be served, today in 2005 content is usually taken from a database, and dynamically generated pages are to be transferred. This development takes the main source of load away from the operating system responsible for reading the files from the hard disk or another type of memory and shifts it to the program that dynamically generates the page. Also computer hardware has evolved. This makes it possible to have web pages generated the way they are today. Generally speaking, servers are capable of serving most pages in quite a reasonable amount of time. This is true as long as only a small number of visitors request pages to be generated. The larger the numbers of clients, the more pages have to be generated simultaneously. Multi-tasking enables servers to do so, but CPU capacity is limited. If it was only for system administrators, they would add more hardware power (for instance clustering servers, load balancing). Often this can be done only to a certain extent, mainly due to financial but also for logistical reasons. From a programmer’s view, however, algorithms can be optimized (consider an algorithm in O(n2) on a fast computer which can easily be overtaken by a slower one running an O(n)) but also by caching techniques. The basis for this diploma thesis will be the analysis of caching strategies for this scenario. They will be used to speed up an existing application. The combination of various methods will be tested and benchmarked to reach a stage at which the application runs at reasonable speed even under high load. Most popular web sites are suffering from server congestion, since they are getting thousands of requests every second in coincidence or not with special events. Moreover, the heterogeneity and complexity of services and applications provided by web server systems is continuously increasing. Traditional web publishing sites with most static contents have being integrated with recent web commerce and transactional sites combining dynamic and secure services. The most obvious way to cope with growing service demand and application complexity is adding hardware resources because replacing an existing machine with a faster model provides only temporary relief from server overload. Furthermore, the number of requests per second that a single server machine can handle is limited and cannot scale up with the demand. Two common approaches to implement a large scale cache