Architecting Resilient Computing Systems: Overall Approach and Open Issues ⋆ Miruna Stoicescu, Jean-Charles Fabre, and Matthieu Roy CNRS ; LAAS ; 7 avenue du colonel Roche, F-31077 Toulouse , France Universit´ e de Toulouse ; UPS, INSA, INP, ISAE ; UT1, UTM, LAAS Abstract. Resilient systems are expected to continuously provide trust- worthy services despite changes in the environment or in the require- ments they must comply with. In this paper, we focus on a methodology to provide adaptation mechanisms meant to ensure dependability while coping with various modifications of applications and system context. To this aim, we propose a representation of dependability-related attributes that may evolve during the system’s lifecycle, and show why this repre- sentation is useful to provide adaptation of dependability mechanisms at runtime. 1 Introduction One of the main challenges nowadays, as stated by IBM in The Vision of Auto- nomic Computing [12], is managing systems for which the total cost of ownership is ever-increasing as they continuously evolve. The solution to this problem would be for such systems to become autonomous, to a certain extent, and no longer depend on humans for performing basic management tasks. Autonomic Computing is also enticing for ubiquitous systems based on tech- nologies such as Wireless Sensor Networks. The aim of Autonomic Computing is described in [22] as addressing “today’s concerns of complexity and total cost of ownership while meeting tomorrow’s needs for pervasive and ubiquitous com- putation and communication”. Our current work shares this vision while adding a fault tolerance axis. A self-healing system is able to identify when its behaviour deviates from the ex- pected one and to reconfigure in order to correct the deviation. We understand a self-healing system as a context-aware fault tolerant system. To ensure a safe adaptation, a validation step has to be added to this scheme, guaranteeing safety during any reconfiguration. More precisely, we define the dynamic adaptation, or self-healing, process as a two-step permanent loop consisting of a monitoring service and an adaptation engine. The monitoring service is in charge of ob- serving the system, measuring certain parameters and resource properties and informing the adaptation engine. The latter must analyze the values, compare the observed behaviour to the expected one, decide if an adaptation is needed, choose a reconfiguration strategy and apply it. ⋆ This work is supported by ANR, contract ANR-BLAN-SIMI10-LS-100618-6-01. E.A. Troubitsyna (Ed.): SERENE 2011, LNCS 6968, pp. 48–62, 2011. c Springer-Verlag Berlin Heidelberg 2011