Fault Characterization and Mitigation Strategies in Desktop Cloud Systems Carlos E. Gómez 1,2[0000-0002-5202-1167] , Jaime Chavarriaga 1[0000-0002-8372-667X] , and Harold E. Castro 1[0000-0002-7586-9419] 1 Systems and Computing Engineering Department Universidad de los Andes, Bogotá, Colombia 2 Universidad del Quindío, Armenia, Colombia {ce.gomez10,ja.chavarriaga908,hcastro}@uniandes.edu.co Abstract. Desktop cloud platforms, such as UnaCloud and CernVM, run clusters of virtual machines taking advantage of idle resources on desktop computers. These platforms execute virtual machines along with the applications started by the users in those desktops. Unfortunately, although the use of computer resources is better, desktop user actions, such as turning off the computer or running certain applications may conflict with the virtual machines. Desktop clouds commonly run appli- cations based on technologies such as Tensorflow or Hadoop that rely on master-slave architectures and are sensitive to failures in specific nodes. To support these new types of applications, it is important to understand which failures may interrupt the execution of these clusters, what faults may cause these errors and which strategies can be used to mitigate or tolerate these. Using the UnaCloud platform as a case study, this paper presents an analysis of (1) the failures that may occur in desktop clouds and (2) the mitigation strategies available to improve dependability. Keywords: Desktop clouds, dependability, reliability, fault analysis, fault tolerance 1 Introduction Volunteer computing platforms [12], desktop grid systems [8], and desktop clouds (DC) [1] demonstrate a lack of dependability and fault tolerance[1][4]. Different from other platform types using dedicated infrastructures, these plat- forms offer opportunistic services, taking advantage of unused computational capacities in desktop computers. Such platforms use software agents that detect inactive or idle desktop resources, and then execute several tasks and applica- tions on those [8][12]. Unfortunately, due to the concurrent presence of users on the same desktop computers, these applications could stop or be affected if This work has been partially carried out with resources provided by the CYTED cofunded Thematic Network RICAP (517RT0529).