J. Parallel Distrib. Comput. 71 (2011) 1356–1366 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc A maximum independent set approach for collusion detection in voting pools Filipe Araujo a, , Jorge Farinha a , Patricio Domingues b , Gheorghe Cosmin Silaghi c , Derrick Kondo d a CISUC, Department of Informatics Engineering, University of Coimbra, Portugal b Research Center for Informatics and Communications, School of Technology and Management of the Polytechnic Institute of Leiria, Portugal c Department of Business Information Systems, Babes . -Bolyai University, Cluj, Romania d Laboratoire d’Informatique de Grenoble, INRIA Rhône-Alpes, Grenoble, France article info Article history: Received 21 May 2010 Received in revised form 31 May 2011 Accepted 16 June 2011 Available online 24 June 2011 Keywords: Collusion detection Maximum independent set Volunteer computing abstract From agreement problems to replicated software execution, we frequently find scenarios with voting pools. Unfortunately, Byzantine adversaries can join and collude to distort the results of an election. We address the problem of detecting these colluders, in scenarios where they repeatedly participate in voting decisions. We investigate different malicious strategies, such as naïve or colluding attacks, with fixed identifiers or in whitewashing attacks. Using a graph-theoretic approach, we frame collusion detection as a problem of identifying maximum independent sets. We then propose several new graph- based methods and show, via analysis and simulations, their effectiveness and practical applicability for collusion detection. © 2011 Elsevier Inc. All rights reserved. 1. Introduction We often find situations where people vote to decide on some course of action. Similarly, processes in distributed systems may also need to vote or compare results for a number of reasons, ranging from replica management [15] to many agreement prob- lems [21,13], including replication of software components in safety-critical systems [22]. By running several replicas written by different teams and running on different hardware, the likelihood of correlated failures drops. In [31], the authors propose to use vot- ing replicas to tolerate not only failures, but actual intrusion by hackers able to manipulate and even totally control the replica. In eBay and in other e-commerce sites, we can find recommender systems where users give feedback to each other, in a process we can consider as voting [1,29]. The BOINC middleware [2], namely the SETI@home project, provides another example of voting in dis- tributed volunteer computing. Workers download workunits from the BOINC server, compute them, and send their results back. How- ever, since volunteers are unreliable, the central supervisor assigns the same workunits to two or more workers. These form voting pools, which are decided by the central supervisor according to the majority of results. Unfortunately, workers may resort to several forms of manip- ulation to negatively affect the decisions of the voting pools. In Corresponding author. E-mail addresses: filipius@dei.uc.pt, filipius3@gmail.com (F. Araujo). this paper, we tackle the problem of detecting colluding nodes that jointly sabotage these pools, in BOINC-like scenarios. We assume that nodes successively gather in different pools to vote. This en- ables the central supervisor to collect tallies, based on the history of each node. Additionally, one or more subsets of the nodes, which we name ‘‘malicious’’ or ‘‘incorrect’’, may compute wrong results, and may even collude (i.e., we consider particular cases of Byzan- tine behaviors [21]). Some malicious nodes, which we term as ‘‘naïve malicious’’, act alone, e.g., because they have faulty hardware. Unlike these, colluder nodes have the ability to communicate using out-of-band mechanisms, to determine whether or not they should produce an erroneous result. We consider some different types of these colluder nodes: some of them always produce wrong results when they are in majority, while others may betray their peer colluders, or refrain from cheating, to disguise their behavior from the central supervisor. Moreover, since it is often impossible to ensure that each worker presents only one identifier to the system, we also consider colluder nodes capable of performing whitewashing attacks [11]. In this case, they may leave and later rejoin the system with a fresh identifier, to circumvent blacklists. In this paper, we propose a number of collusion detection algo- rithms that aim to uncover malicious nodes. We base these algo- rithms on a graph defined by the votes that nodes do against each other, the Votes Against Graph, and on the Maximum Independent Set Problem (MIS). This will enable the central supervisor to take corrective measures to ensure validity of results. For example, it may request additional workers for some possibly incorrect voting pools. 0743-7315/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2011.06.004