Efficient Recovery Information Management Schemes for the Fault Tolerant Mobile Computing Systems Taesoon Park Department of Computer Engineering Sejong University Seoul 143-747, KOREA tspark@kunja.sejong.ac.kr Namyoon Woo and Heon Y. Yeom Department of Computer Science Seoul National University Seoul 151-742, KOREA nywoo,yeom @arirang.snu.ac.kr Abstract This paper presents region-based storage management schemes, which support the efficient implementation of checkpointing and message logging for fault tolerant mo- bile computing systems. In the proposed schemes, a recov- ery manager assigned for a group of cells takes care of the recovery for the mobile hosts within the region. As a result, the recovery information of a mobile host, which may be dispersed over the network due to the mobility of the host, can efficiently be handled. 1. Introduction Considering the fact that mobile hosts(MHs) are vul- nerable to the failure, it is desirable for the mobile com- puting system to be equipped with a proper recovery facility; and checkpointing-recovery is one of the ser- vices to build a fault-tolerant system. For the mobile environment, extended schemes of coordinated check- pointing, communication-pattern-based checkpointing and communication-induced checkpointing have been proposed in [5], [1] and [2], respectively. Also, for the asynchronous recovery, optimistic and pessimistic logging schemes are considered in [3] and in [4, 6], respectively. To save checkpoints and message logs of a MH, the sta- ble storage of mobile support stations(MSSs) are used due to the lack of the spaces in MHs. Hence, as the MH moves, the recovery information of the MH becomes dispersed over a number of MSSs, and in case of a failure, the MH must collect the proper information from a number of MSSs. Considering the cost to transfer the recovery information during the hand-off or to collect it during the recovery, ef- ficient management of distributed information becomes an important issue to design the fault tolerant mobile system. Some suggestions for this problem have been made in [4, 6]. For fast recovery, a MH in [4] carries the recovery informa- tion as it moves, and in [6], the home of each MH is utilized as a centralized information storage. This paper presents distributed storage management schemes based on a region manager for efficient manage- ment of recovery information in the mobile environment. To reduce the recovery cost, a MH should carry the recovery information into the near MSS; however, for the low failure- free operation cost, the recovery information must remain dispersed over the MSSs visited by the MH. The region- based scheme considers both. For a MH moving within a region, the recovery information is taken care of the region manager; and the information is transferred to the new re- gion manager, only when the MH moves out of the region. By varying the size of a region, the scheme can control the hand-off cost as well as the recovery cost. 2. Mobile Computing System Model The mobile computing system considered in this paper follows the model presented in [1]. The system consists of a set of MHs and MSSs. A wireless communication link can be established between a MH and a MSS; and a high speed wired communication link is assumed between any two MSSs. An area covered by a MSS is called a cell. Dis- tributed computation is performed by a set of processes run- ning on MHs and it is assumed to follow the piece-wise de- terministic model, in which a process always produces the same sequence of states if the same sequence of message re- ceipt events would happen at the process. The processes are assumed to be fail-stop; that is, in case of a failure, the pro- cess stops its execution and does not perform any malicious action. When a MH crosses a boundary between two cells, it ends its current connection by sending a leave(MH-id) message to the old MSS, and then establish a new connec- tion by sending join(MH-id, previous MSS-id) message to the new MSS. It is called a hand-off.