Autonomic Data Placement Strategies for Update-intensive Web applications Swaminathan Sivasubramanian Guillaume Pierre Maarten van Steen Dept. of Computer Science, Vrije Universiteit Amsterdam, The Netherlands Email:{swami,gpierre,steen}@cs.vu.nl Abstract Edge computing infrastructures have become the leading platform for hosting Web applications. One of the key chal- lenges in these infrastructures is the replication of applica- tion data. In our earlier research, we presented GlobeDB, a middleware for edge computing infrastructures that per- forms autonomic replication of application data. In this pa- per, we study the problem of data unit placement for update- intensive Web applications in the context of GlobeDB. Our hypothesis is that there exists a continuous spectrum of placement choices between complete partitioning of sets of data units across edge servers and full replication of data units to all servers. We propose and evaluate different fam- ilies of heuristics for this problem of replica placement. As we show in our experiments, a heuristic that takes into ac- count both the individual characteristics of data units and the overall system load performs best. 1. Introduction Edge service architectures have become the most widespread platform for distributing Web content over the Internet. Commercial Content Delivery Net- works (CDNs) like Akamai [1] and Speedera [14] de- ploy edge servers around the Internet that locally cache (static) Web pages and deliver them from servers lo- cated close to the clients. However, the past few years have seen significant growth in the amount of Web content gen- erated dynamically using Web applications. These Web ap- plications are usually database driven and generate Web content based on individual user profiles, request param- eters, etc. When a request arrives, the application code examines the request, issues the necessary read or up- date transactions to the database, retrieves the data and composes the page which is then sent back to the client. Traditional CDNs use techniques such as frag- ment caching whereby the static fragments (and some- times also certain dynamic parts) of a page are cached at the edge servers [5, 10, 6]. However, the growing need for per- sonalization of content (which leads to poor temporal locality among requests) and the presence of data up- dates significantly reduce the effectiveness of these solu- tions. To handle such applications, CDNs often employ edge computing infrastructures where the application code is replicated at all edge servers. Database accesses become then the major performance bottleneck. This warrants the use of database caching solutions, which cache certain parts of the database at edge servers and are kept consistent with the central database. However, these infrastructures require the database administrator to define manually which part of the database should be placed at which edge server. In our earlier work, we described the design and imple- mentation of GlobeDB, an autonomic replication middle- ware for Edge Computing infrastructures. The distinct fea- ture of GlobeDB is that it performs autonomic placement of application data by monitoring the access to the under- lying data. Instead of replicating all data units at all edge servers, GlobeDB automatically replicates the data only to the edge servers that access them often. GlobeDB provides Web-based data-intensive applications the same advantages that CDNs offer to traditional Web sites: low latency and re- duced network usage [13]. The data placement heuristics developed in this previous work assumed that the number of data update requests is relatively low compared to that of the read requests. While this assumption is often true, there exists a class of appli- cations that receive a large number of updates. For exam- ple, a stock exchange Web site which allows its customer to bid or sell stocks in real time is likely to receive large quan- tities of updates (the New York Stock Exchange receives in the order of 700 update requests per second [8]). Replicating an update-intensive application while main- taining consistency among the replicas is difficult because each update to a given data unit must be applied at every server that holds a copy of it. In such settings, creating ex- tra replicas of a data unit can have the paradoxical effect of increasing the global system’s load rather than decrease it. This may be a significant problem as the service time to up- date a data unit is usually an order of magnitude higher than that to read a data unit. Placing replicas for update-intensive applications war-