Future Generation Computer Systems 29 (2013) 936–952 Contents lists available at SciVerse ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs Resource requirement prediction using clone detection technique Madhulina Sarkar a , Triparna Mondal b , Sarbani Roy b, , Nandini Mukherjee b b Department of Computer Science and Engineering, Jadavpur University, Kolkata – 32, India a Department of Computer Sc. and Engg., Govt. College of Engineering and Leather Technology, Kol- 98, India article info Article history: Received 13 January 2012 Received in revised form 21 September 2012 Accepted 26 September 2012 Available online 24 October 2012 Keywords: Resource management Job modeling Code cloning Software metrics abstract In order to maintain the QoS requirements of jobs running on a large distributed system, like Cloud and Grid environments, resource requirements of jobs should be predicted prior to their submission, and on the basis of this prediction, appropriate resources can be selected for their execution. However, because of the dynamic and heterogeneous nature of the modern distributed systems, estimation of resource requirements is a challenging task. This paper presents a feedback-based job modeling scheme based on clone detection technique. In this scheme, the execution data for each job which runs in the environment is stored in Execution History. A newly submitted job is analyzed to find its clones from the execution history and on the basis of the data stored in the execution history, the resource requirement of the new job is predicted. Different levels of clones are discussed in this paper and a metric-based clone detection technique is presented. An automatic resource requirement prediction scheme for jobs is proposed. The paper also evaluates a preliminary implementation of the scheme and discusses the results of using the scheme for some test codes. © 2012 Elsevier B.V. All rights reserved. 1. Introduction A large distributed system, like Cloud and Grid, comprises dy- namic pool of heterogeneous resources which are distributed over a large geographical area and administered under multiple do- mains. Such a dynamic, heterogeneous pool of resources is effec- tively managed by a resource management component which also focuses on allocation of appropriate resources to the jobs [1–5]. A resource management component dynamically identifies, and characterizes the resources and allocates the resources to the jobs submitted by the users. However, for finding a suitable resource for a job, assessing the resource requirements of the job is important. So far, in most of the existing middlewares [6–11] the resource re- quirements for a job are input by the user (usually, the owner of the application) on the basis of some domain knowledge. Such a technique is generally inaccurate and may often lead to overesti- mation or underestimation of the resource requirements. Overes- timation causes waste of computing resources. On the other hand, it may not be possible for a job to achieve the desired performance or even complete its execution successfully in an environment due to underestimation of resource requirements. Thus, the objective of this paper is to use a feedback-based job modeling technique for resource requirement prediction. The Corresponding author. E-mail addresses: madhulina.sarkar@gmail.com (M. Sarkar), sarbani.roy@cse.jdvu.ac.in, sarbani.roy@gmail.com (S. Roy), nmukherjee@cse.jdvu.ac.in (N. Mukherjee). concept of job modeling can be effectively used to characterize the resource requirements of jobs. The data model used for this purpose is called a job model. When a newly submitted job is accepted in the system, its job model is built and matched with the jobs executed earlier in the environment and categorized according to its clone level with respect to the earlier jobs [12]. Matching is done on the basis of certain techniques as discussed later in this paper. After identification of the clone level of the newly submitted job with respect to some previously executed jobs, runtime data stored in the Execution History of the earlier executed jobs are retrieved and some statistical methodologies are applied to predict the resource requirements of the new job. Thus, this paper presents a feedback-guided scheme for esti- mating resource requirements for a job before its submission to the system. Once the job is submitted, its execution is guided and resources are added or removed based on an adaptive execution scheme presented in [4,5]. The scheme has been implemented as a tool, called PRAGMA, which accepts a batch of jobs from a user along with their resource requirements, performs initial resource allocation for the batch of jobs based on user provided informa- tion and current status of the resources, and later, on the basis of run-time performance of the jobs, performs rescheduling (migra- tion) of jobs to different resource providers or takes other tuning actions. In the present implementation of PRAGMA, resource re- quirement prediction at the time of initial allocation of jobs onto Grid resources is left in the hands of the users who may have little or even no idea about the actual resource requirements of the jobs. The job modeling techniques presented in this paper are proposed 0167-739X/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2012.09.010