Transparent Cloud Privacy: Data Provenance Expression in Blockchain Gabriel Hogan and Markus Helfert ADAPT Centre, Dublin City University, Ireland Keywords: Cloud, Privacy, Data Provenance, Transparency, W3C, PROV, Blockchain, Distributed Ledger Technologies. Abstract: The development of Cloud processing and ‘Big Data’ have raised many concerns over the use to which data is being put. These concerns have created new demands for methodologies, and capabilities which can provide transparency and trust in data provenance in the Cloud. Distributed ledger technologies (DLTs) have been proposed as a possible platform to address cloud big data provenance. This paper examines the W3C recommendation for data provenance PROV and if the blockchain DLT can apply the core primary PROV attributes required to satisfy data provenance. The research shows that not all data provenance expressions can be provided by blockchain. Instances of data provenance which rely on circular references are not possible as the blockchain DLT is a single linked list. 1 INTRODUCTION Provenance is a well-established and well understood concept which seeks to establish the origin, lineage, history, transactions on, and ownership of, an artefact and has been applied in many domains, including art, antiquities, finance, and procurement, to name just a few, over many centuries. Data Provenance applies the concept of provenance to the digital data domain. This has application in nearly all the current digital domains where data and content are being produced and transacted at an ever-increasing rate, but particularly in the Cloud based ‘big data’ domain. The importance of tracking provenance is widely recognized, as witnessed by significant research in various areas including: e-science (Janowicz et al, 2018), (Sigurjonsson, 2018); data warehousing (Hambolu et al, 2016); democratic decision making (Aragón et al, 2014), (Beris and Koubarakis, 2018); e-Health (Masi and Miladi, 2018); digital forensics (Ulybyshev et al, 2018), (Zawoad et al, 2018); security (Cha and Yeh, 2018); news checking (Huckle and White, 2017); and information theory (Lemieux, 2016), (Lemieux and Sporney, 2017), to name just a few. As Cloud based processing, storage and ‘big data’ has become ubiquitous, privacy concerns have become common to all these areas, raising the same problems that data provenance seeks to address: where did this data originate, what is its history and how can these be shown? For ordinary people this has many specific use cases including identity theft, breach of copyright, digital anonymity, and the ability to see, and gain control over, how individuals’ personal data is transacted, used or misused. For organisations collecting and using personal data, the ability to organise, audit, and verify compliance with legislation such as Sarbanes-Oxley (Congress of the United States, 2002), Health Insurance Portability and Accountability (Congress of the United States, 1996), Gramm-Leach-Bliley Financial Services Modernization (Congress of the United States, 1999), are key requirements in business today. This creates new challenges to organisational strategies and the management of data provenance in their Cloud and ‘big data’ infrastructure and management. The increased public awareness of the use and misuse of big data has raised many privacy concerns, particularly in opaque Cloud based technologies (Zou, 2016), (Pahl et al, 2018). These public concerns have resulted in the introduction of specific new legislative concepts and laws which seek to mitigate and address these issues, such as General Data Protection Regulation (GDPR) (European Commission, 2016), which seeks to not only regulate what data can be used and how it can be used, but also where it can be used. This along with the Payment 430 Hogan, G. and Helfert, M. Transparent Cloud Privacy: Data Provenance Expression in Blockchain. DOI: 10.5220/0007733404300436 In Proceedings of the 9th International Conference on Cloud Computing and Services Science (CLOSER 2019), pages 430-436 ISBN: 978-989-758-365-0 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved