JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 23, 1325-1337 (2007) 1325 Taking Point Decision Mechanism for Page-level Incremental Checkpointing based on Cost Analysis of Process Execution Time * SANGHO YI, JUNYOUNG HEO, YOOKUN CHO AND JIMAN HONG + School of Computer Science and Engineering Seoul National University Seoul 151-172, Korea + School of Computing Soongsil University Seoul 156-743, Korea E-mail: jiman@ssu.ac.kr Incremental checkpointing, which is intended to minimize checkpointing overhead, saves only the modified pages of a process. This means that in incremental checkpoint- ing, the time consumed for checkpointing varies according to the amount of modified pages. Thus, efficient intervals of checkpointing have to be determined on run-time of a process. In this paper, we present an efficient and adaptive page-level incremental check- pointing facility that is based on the taking point decision mechanism for minimizing the total execution time. Our simulation results show that the expected execution time was significantly reduced compared with existing periodic page-level incremental check- pointing. Keywords: checkpoint and recovery, page-level incremental checkpointing, fault toler- ance, Linux kernel, operating system reliability 1. INTRODUCTION Checkpointing is an effective mechanism that allows a process that was discontin- ued by a system failure to resume its execution without having to restart from the begin- ning [1, 2]. By taking a checkpoint, a process can resume its execution from the most recent checkpoint state, hence limiting reprocessing time that would be necessary when a failure occurs. That is, checkpointing can reduce the expected execution time of a proc- ess that is consumed when a system failure occurs. However, there are certain trade-offs to this process, such as checkpointing overhead to achieving its intended objective, such as achieving minimum recovery time and minimum process execution time [3]. Several techniques [2, 4-7] have been devised and implemented to minimize these checkpointing overheads. They can be divided into two groups [2]. One is the latency hiding optimization techniques such as diskless checkpointing [6], forked checkpointing [5] and the compression checkpointing [5] which attempt to reduce or hide the disk writ- ing overhead. The other is the size reduction techniques such as memory exclusion checkpointing [7] and incremental checkpointing [4, 5] which attempt to minimize the Received November 15, 2006; accepted February 15, 2007. Communicated by Sung Shin and Tei-Wei Kuo. * This research was supported by the Soongsil University Research Fund and the Brain Korea 21 project. + Corresponding author.