J. Cent. South Univ. (2014) 21: 3864−3872 DOI: 10.1007/s11771-014-2373-x Task scheduling scheme by checkpoint sharing and task duplication in P2P-based desktop grids Joon-Min Gil, Young-Sik Jeong 1. School of IT Engineering, Catholic University of Daegu, 13-13, Hayang-ro, Hayang-eup, Gyeongsan-si, Gyeongbuk 712-701, Korea; 2. Department of Multimedia Engineering, Dongguk University, 30 Pildong-rol-gil, Jung-gu, Seoul 100-715, Korea © Central South University Press and Springer-Verlag Berlin Heidelberg 2014 Abstract: A scheduling scheme is proposed to reduce execution time by means of both checkpoint sharing and task duplication under a peer-to-peer (P2P) architecture. In the scheme, the checkpoint executed by each peer (i.e., a resource) is used as an intermediate result and executed in other peers via its duplication and transmission. As the checkpoint is close to a final result, the reduction of execution time for each task becomes higher, leading to reducing turnaround time. To evaluate the performance of our scheduling scheme in terms of transmission cost and execution time, an analytical model with an embedded Markov chain is presented. We also conduct simulations with a failure rate of tasks and compare the performance of our scheduling scheme with that of the existing scheme based on client-server architecture. Performance results show that our scheduling scheme is superior to the existing scheme with respect to the reduction of execution time and turnaround time. Key words: P2P-based desktop grids; checkpoint sharing; task duplication; embedded Markov chain 1 Introduction Desktop grids are used in a practical computing paradigm that can process massive computational tasks in various application areas, using the idle cycles of the heterogeneous resources (generally desktop computers) connected over the Internet and owned by different individual users. They are generally suitable for the large-scale applications composed of hundreds of thousands of small-sized tasks for the same computational code. It is well-known that desktop grids make it possible to obtain large-scale computing power with a low cost [1−2]. Since the success of SETI@Home [3−4], a variety of desktop grid platforms, such as BOINC [5−6], XtremWeb [7], Korea@Home [8], SZTAKI [9], QADPZ [10], have been developed. The commercial desktop grid systems, such as Entropia [11] and United Devices [12], are released for enterprise computing, and some practical applications for desktop grids are reported in Refs. [13−14]. An important aspect in desktop grids is that each resource has a volatility property, due to free withdrawal from execution participation even in the middle of task execution. Moreover, each resource has a heterogeneity property as it has a totally different computing environment (e.g., CPU performance, memory capacity, and network speed) [15]. One critical issue of a desktop grid environment is to minimize the execution time of all tasks, even if these two properties affect overall performance adversely [1]. Unexpected failures can be considered degrading factors in the minimization of execution time, which can be partially addressed with the use of a checkpointing mechanism at the application level [16−17]. Another method of minimizing the execution time is to share all of the checkpoints performed on each resource [18]. Checkpoint sharing is a method of reusing the checkpoint, which has been recently performed on a local desktop in another resource (i.e., the intermediate result of a task is transmitted to other resources so that task execution from the last checkpoint position can be restated). Consequently, the purpose of checkpoint sharing is to reduce the execution time of tasks, leading to a reduction in turnaround time. Most desktop grid systems, however, use a client-server model as their main architecture [6, 11, 19]. Although this model is simple in architecture as well as in the control of resources and tasks, it concentrates all functions on the central server, which heightens the bottleneck phenomenon in the server. Moreover, in the client-server model, checkpoint sharing is based on storing checkpoints in a central stable Received date: 2013−11−20; Accepted date: 2014−01−16 Corresponding author: Young-Sik Jeong; E-mail: ysjeong@dongguk.edu