1 Improving Java Card Grid Dependability with Fault Prevention and Fault Tolerance Monia Ben Brahim, Maher Ben Jemaa, and Mohamed Jmaiel ReDCAD Laboratory National School of Engineers of Sfax BPW 1173 Sfax Tunisia Emails: monia.benbrahim@redcad.org Maher.benjemaa@enis.rnu.tn Mohamed.Jmaiel@enis.rnu.tn Abstract—We present in this paper a solution that increases the dependability of a Java Card Grid platform by improving the grid service availability and continuity. Our solution is based on fault prevention and fault tolerance. It mainly tolerates material faults, communication faults, and faults related to the easy mobility of a smart card. Fault tolerance assures that the grid provides the appropriate answer to the client despite the failure of the active card service. It is implemented by applying a passive replication strategy at card services level as well as by the possibility to substitute the failed card service by a composition of other card services. Fault prevention permits to prevent the use of an unavailable card service in order to avoid the need for a recovery. The evaluation of execution times shows that the overhead of the added software layer is relatively acceptable either when there is no fault or when the recovery is made by replication. However, the overhead is much bigger when the recovery is made by composition. I. I NTRODUCTION An inherent problem of distributed systems is their depend- ability [4], [16]. Indeed, the dependability is the property that allows system users to grant a justified confidence to the delivered service. It answers to several requirements such as the reliability, the availability, the maintainability, and the security of the system. The Java Card Grid platform (JCG) [3], [7], [10] is a dis- tributed platform that is particularly secured since it is based on the Java Card technology [12], [17]. However, it doesn’t consider the handling of faults related to cards and their readers such as a dysfunction of the card or its reader, or their extraction in inattentive or malicious way. These faults, when they occur at execution time, affect the grid state and lead to the omission failure of the JCG platform that becomes unable to answer to the client’s request. Even when they occur while cards are inactive, they may affect the service availability in the grid. The JCG platform dependability was seen only from the security angle [3], [7], [9], [10], [15] and in our knowledge no prior work treated reliability and availability issues in a JCG. In this paper, we present a solution that increases the dependability of a JCG platform by improving the grid service availability and continuity. Our solution is based on fault prevention and fault tolerance. It takes the fault handling into account by preventing the reuse of a failed service. It also handles the error by assuring that the grid provides the appropriate answer to the client despite the failure of the active card service. The error handling is performed through applying a passive replication strategy at card services level and through extending this strategy with the possibility to substitute the failed card service by a composition of other card services. The recovery by composition is possible when the abundance and the diversity of services, held in the grid, allow composition of card services having varied complexity degrees. We preserve of the passive replication strategy the aspects of replication and passivity of replicas. As card services we deal with in this work are stateless, it becomes useless to synchronize the duplicates. The possibility to substitute a failed card service by a composition of services permits to tolerate the fault even if all secondary replicas are unavailable (failed, retired of the grid, or already active). It also allows, if the duplication is only intensified at elementary card services level, to favor the recovery by composition of services. The fault tolerance layer that we designed and implemented includes three software components: the recovery manager, the global service registry, and the message monitor. The interaction between these components is transparent to the user and assures a fault tolerant execution of his application on the JCG platform. Also, it was designed in such a way that permits to deal with the possible faults that can occur during the handling of a previous error. In addition, we improved the global service registry architecture in order to prevent the use of an unavailable card service. We implemented some prevention mechanisms such as monitoring of events and access in mutual exclusion. Our evaluation tests show that our approach minimizes the execution time overhead when there is no fault, and keeps a proportionally acceptable execution time compared to the initial one when the recovery is based on replication. However, the recovery by composition, although it is not expensive from the resources replication perspective, seems to be costly from the execution time perspective. This paper is organized as follows. We describe in section 2 the JCG platform that we deployed. In section 3, we enumerate the faults that we tolerate in the JCG and present the fault tolerance layer. In section 4, we present the faults to prevent and the different prevention procedures. Then, section 5 discusses the evaluation results of our approach. Section 6 presents next the related works and the necessary background. We finally conclude and present perspectives of our work.