662 International Journal on Advances in Intelligent Systems, vol 7 no 3&4, year 2014, http://www.iariajournals.org/intelligent_systems/ 2014, © Copyright by authors, Published under agreement with IARIA - www.iaria.org Representing and Publishing Cyber Forensic Data and its Provenance Metadata: From Open to Closed Consumption Tamer Fares Gayed, Hakim Lounis Dépt. d’Informatique Université du Québec à Montréal Succursale Centre-ville, H3C 3P8, Montréal, Canada gayed.tamer@courrier.uqam.ca lounis.hakim@uqam.ca Moncef Bari Dépt. de Didactique Université du Québec à Montréal Succursale Centre-ville, H3C 3P8, Montréal, Canada bari.moncef@uqam.ca Abstract—Role players of any forensic investigation process record chronologically all forensic data resulted from their investigation, in order to be presented to the juries in the court of law. When such results are recorded and posted, they are called chain of custodies (CoCs). The forensic data provided within these documents play a vital role in the process of forensic investigation, because they answer questions about how evidences are collected, transported, analyzed, and preserved since their seizure through their production in court. Provenance metadata accompany these forensic data to answer questions about the origin of these data and build trustworthy between role players and juries in order to make the tangible CoCs admissible in the court of law. Nowadays, with the advent of the digital age, the forensic investigation is not only applied to physical crime, but also on digital evidences. The forensic data and their metadata presented in these tangible documents need also to undergo a radical transformation from paper to electronic data in order to accommodate this evolution. CoCs should be also readable and consumable not only by human but also by machines. The semantic web is a fertile land to represent and manage the tangible CoCs, because it uses web principles known as Linked Data Principles (LDP), which provide useful information in Resource Description Framework (RDF) format upon Unified Resource Identifiers (URI) resolution. In addition, it includes different provenance vocabularies that can be useful to express the forensic metadata. Generally, the power of LDP resides in publishing data publicly without any access restriction on the web. However, the openness of forensic data and their metadata should not be the same case. They should obey some access restriction in order to be shared only between role players and juries. Public Key Infrastructure (PKI) can be applied to restrict the access to some or all resources of represented data and bends the LDP from open to closed consumption, while maintaining the resolution of such restricted resources. Juries in turn will consume the restricted represented data using different LDP consumption applications. This paper provides the complete framework explaining how forensic and provenance data are represented and published using LDP, and how PKI can be used to restrict these data/resources in order to be shared in a closed scale. Evaluation of the framework using several empirical experimentations will not be on the scope of this paper. Keywords-Linked Open Data, Linked Data Principles, Linked Closed Data, Public Key Infrastructure, Digital Certificates, Cyber Forensics, Chain of Custody. I. INTRODUCTION The history of forensic investigation task dates back thousands of years. This task is concentrating to gather and examine evidences about the past, in order to prosecute in the future the criminal in the court of law. With the advent of Information and Communication Technology (ICT), forensic investigation is not only concentrated on physical crime, but also on the digital evidences. This emerged a new type of forensic investigation known by computer/cyber/digital forensic. It combines computer science concepts including computer architecture, operating systems, file systems, software engineering, and computer networking, as well as legal procedures. At the most basic level, the digital forensic process has three major phases: extraction, analysis, and presentation. Extraction phase (i.e., it is also known as acquisition) saves the state of the digital source (e.g., laptop, desktop, computers, mobile phones, or any other digital devices) and creates an image by saving all digital values so it can be later analyzed [1]. Analysis phase takes the acquired data (e.g., file and directory contents and recovering deleted contents) and examines it to identify pieces of evidence, and draws conclusions based on the evidences that were found. During presentation phase, the audience is typically the judges; in this phase, the conclusion and corresponding evidence from the investigation analysis are presented to them [2][3]. However, there exist others models of cyber forensic process, each of them relies upon reaching a consensus about how to describe digital forensics and evidences [4][5]. Investigation models are numerous. Many works were provided to explain and compare such models [6][7][8][9]. Table I shows the current digital forensic models. Each row of the table presents the name of the digital forensic process model, while the columns present the processes included in each of these models [5][10]. The role players such as first responders, investigators, expert witnesses, prosecutors, police officer, etc. may be assigned one or more phase in the forensic process. They are those who are responsible to create and record their own investigation results and post them in tangible documents.