Integrity issues in a Grid environment RACHEL AKIMANA AND OLIVIER MARKOWITCH D´ epartement d’Informatique Universit´ e Libre de Bruxelles Bd. du Triomphe – CP212, 1050 Bruxelles BELGIUM {rakimana, omarkow}@ulb.ac.be Abstract: - Grids are large distributed systems composed of resources of many computing systems used to resolve problems that require heavy computations on large amount of data. In such a large distributed system, ensuring infor- mation integrity is of particular importance. Honest users and possible malicious entities live together in this network, the risks of unauthorized alterations of data and information cannot be ignored. Since large amount of data are stored on Grid’s resources, insurance has to be given that data are not altered by unauthorized hands. This insurance is given by the security service called integrity of passive data. A guarantee must also be given that the asked computation are executed in the right way, in order to produce output data that are trustworthy. This guarantee is given by the security service called integrity of active data. We analyze in this paper these integrity concerns and, in particular, we identify the needs when considering privacy aspects of passive data. The second part of the paper is dedicated to the integrity of active data where we propose a Grid’s adapted framework to ensure the integrity of these active data. Key-Words: - Grid security, integrity, privacy 1 Introduction A Grid is a (widely) distributed system composed of re- sources of many computing systems. It is usually used to resolve scientific or technical problems that require a large amount of resources. Grids perform heavy computations on large amount of data, by breaking them down into many smaller pieces, or provide the ability to process many com- putations in parallel. Therefore, a Grid is a parallel and distributed system that allows to share and aggregate geo- graphically distributed resources. In such a large distributed system, it is of particular importance to ensure data integrity. Since a Grid is usu- ally a huge system, a lot of different users are using its resources. Some of these users may be malicious entities. Therefore, the risks of unauthorized alterations of data and information that are stored or processed on Grid resources, or even that are traveling on the Grid’s network, cannot be disregarded. Large amount of data are stored on Grid’s resources. These data are used as input for distributed executions and/or are the results of these executions. It is crucial that these data are not illegitimately altered. Therefore, we have to ensure the integrity of these data. We are dealing here with the integrity of passive data. On another hand, the users need to have the guarantee that the asked executions are correctly processed. The jobs submitted on a Grid have to be executed in the right way with the proper input data. And in consequence, the result- ing output data have to be reliable. This is also a kind of integrity that we call integrity of active data. In this paper we consider these two kinds of integrity concerns. In section 2 we examine the problem of the integrity of passive data, consider existing integrity tech- niques and propose a protocol that ensures the integrity of stored data and that also preserves the privacy of the en- tities related to the information carried by these data. In section 3, we consider the problem of integrity of active data. We look at the existing results proposed to design fault-tolerant distributed systems, and we propose a new scheme that fits the particular framework of the Grid. 2 Integrity of passive data When considering the context of the Grid, passive data may refer to data resulting from experiments and simu- lations. These data are generally organized in databases accessible to Grid users. Such data may either belong to known Grid users or may remain anonymous. In both cases, Grid users want to get the assurance that the con- sulted data has not been altered by unauthorized hands. Usually, hashing functions and/or digital signatures are used to ensure data integrity. For example, keyed hash functions (MAC) may be used on database contents when the corresponding secret keys are securely shared. However, the secret key man- agement in a large distributed system like a Grid is not straightforward. Digital signature schemes can be used to guarantee the integrity of data whose owner is known (to allow the public key-based signature verification). How- ever, using digital signatures in a classical way are not an appropriate tool when we deal with the integrity of anony-