A Modeling Framework for Self-Healing Software Systems Michael Jiang, Jing Zhang, David Raymer, and John Strassner Motorola Network Infrastructure Research Lab, Autonomics Research {michael.jiang,j.zhang,david.raymer,john.strassner}@motorola.com Abstract. For a system to be capable of self-healing, the system must be able to detect what has gone wrong and how to correct it. This paper presents a generic modeling framework to facilitate the development of self-healing software systems. A model-based approach is used to cate- gorize software failures and specify their dispositions at the model level. Self-healing is then achieved by transforming the model of the system into platform-specific implementation instrumented with failure detec- tion and resolution mechanisms to mitigate the effect of software failures and maintain the level of healthiness of the system. Key words: Autonomics, modeling, AOM, model transformation 1 Introduction When applied to computer-based systems and networks (CBSN), self-healing is one aspect of the set of capabilities exhibited by autonomic computational systems often represented by the phrase self-*, which most often includes self- protecting, self-configuring, self-healing, and self-optimizing. The basis of auto- nomic computing draws on biological analogies to describe an autonomic com- puter as a computer that is self-governing, in the same manner that a biological organism is self-governing [1][2]. A self-healing software system is one that has the ability to discover, diagnose, and repair (or at least mitigate) disruptions to the services that it delivers. For large scale systems, many different types of faults may exist, and their differing natures often require disparate, tailored approaches to detect, let alone fix them. Hence, for large scale systems, a self-healing system should also be able use multiple types of detection, diagnosis, and repair mechanisms. Autonomic systems extend the above notion of self-healing to include the capabilities to adapt to changes in the environment, for example, to maintain its performance, or availability of resources. This paper presents a modeling framework to specify and implement self- healing focusing on the software aspects of a system. Model constructs are used to classify software failures and specify their dispositions apart from the models that capture the base functionality of the system. Self-healing is achieved by transforming base models as well as self-healing models into platform-specific