1 AbstractUnpredictable access to batch-mode HPC resources is a significant problem for emerging dynamic data-driven applications. Although efforts such as reservation or queue-time prediction have attempted to partially address this problem, the approaches strictly based on space-sharing impose fundamental limits on real-time predictability. In contrast, our earlier work investigated the use of feedback-controlled virtual machines (VMs), a time-sharing approach, to deliver predictable execution. However, our earlier work did not fully address usability and implementation efficiency. This paper presents an online, software-only version of feedback controlled VM, called self-tuning VM, which we argue is a practical approach for predictable HPC infrastructure. Our evaluation using five widely- used applications show our approach is both predictable and practical: by simply running time- dependent jobs with our tool, we meet a jobs deadline typically within 3% errors, and within 8% errors for the more challenging applications. I. INTRODUCTION Many pioneering projects including real-time mesoscale weather prediction [4], costal hazard prediction [5], and patient-specific medical modeling [6] have started to explore the opportunities and challenges that arise when scientific modeling is used to process environmental, real-time events. This emerging class of HPC jobs must produce results within explicit, possibly evolving, deadlines due to a dependence on real-time data. The most difficult challenge today for such applications is that HPC infrastructures are typically operated in shared batch-mode and do not provide predictability both in regard to an HPC jobs start time as well as their duration. Although a dedicated supercomputer may solve the problem, the huge cost for acquiring and maintaining the resources that could idle a significant amount of time does not render it a viable solution. Most existing research in this area thus attempts to eliminate a jobs wait time via advance reservation [1][7][8][9], despite a potentially severe resource underutilization [9]. Moreover, reservation requires strict planning that can involve time-consuming interactions between users and resource providers (e.g., TeraGrid requires reservations be made at least one week in advance). The sporadic nature of dynamic events may not permit such planning. Our earlier results [3] introduced a fundamentally different approach to solve HPC unpredictability. In our Compute Throttling Framework, instead of attempting to achieve predictability by controlling an HPC jobs wait time and granting exclusive access to a resource, our mechanism controls a jobs running time by hosting jobs in virtualized resources, called performance containers, and throttlingup/down the jobs access to resources. We use a feedback controller to dynamically supply/remove system resources to the container(s). We showed that we are able to achieve predictable run-time performance, without requiring exclusive access to resources, and while still being reactive to unexpected events (e.g., new job arrivals, within limits). However, a significant limitation of [3] is that arguably only experts in control theory were realistic candidates for using our system. For example, our run-time system required a broad, quantitative understanding of the target application‟s behavior in a variety of situations in order to regulate application progress dynamically. Sophisticated knowledge of control theory was necessary to determine the feedback controller parameters through a manual modeling process (e.g., Matlab). The research reported in this paper significantly improves the usability of our control theoretic approach while retaining good controller performance that was the result of comprehensive manual modeling by an expert. We achieve usability by creating a self-tuning VM that performs application Sang-Min Park and Marty Humphrey Department of Computer Science, University of Virginia, Charlottesville, VA 22904 Self-Tuning Virtual Machines for Predictable eScience