Theory Comput Syst
DOI 10.1007/s00224-014-9599-8
FNB: Fast Non-Blocking Coordinated Checkpointing
Protocol for Distributed Systems
Zohra Abdelhafidi · Mohamed Djoudi ·
Nasreddine Lagraa · Mohamed Bachir Yagoubi
© Springer Science+Business Media New York 2015
Abstract This paper presents a Fast Non-Blocking coordinated checkpointing pro-
tocol for distributed systems with the aim of minimizing the number of requests and
mutable checkpoints while reducing the checkpointing latency. Our protocol relies on
two mechanisms; the first one is piggybacking dependency information on computa-
tion and reply message, thereby, tracking direct, transitive and hidden dependencies
among processes. The second one is popular processes; due to the communication
between processes, it is more desirable that the checkpointing procedure is initiated
by popular processes having more dependency information. In fact, this way may
reduce the checkpointing latency and the likelihood of checkpointing halting caused
by fault occurrence. We also present a simulation study that compares our proto-
col to CSNB protocol (Cao and Singhal Non-Blocking) and CSB.protocol (Cao and
Singhal Blocking)
Keywords Distributed systems · Fault tolerance · Coordinated checkpointing ·
Dependency · Popular process
Z. Abdelhafidi () · M. Djoudi · N. Lagraa · M. B. Yagoubi
Computer Science and Mathematic Laboratory, Amar Telidji University, Road of Ghardaia,
BP 37G, Laghouat 03000, Algeria
e-mail: z.abdelhafidi@mail.lagh-univ.dz
M. Djoudi
e-mail: m.djoudi@mail.lagh-univ.dz
N. Lagraa
e-mail: n.lagraa@mail.lagh-univ.dz
M. B. Yagoubi
e-mail: m.yagoubi@mail.lagh-univ.dz