Using Data Provenance to Measure Information Assurance Attributes Abha Moitra, Bruce Barnett, Andrew Crapo Stephen J Dill General Electric Global Research Lockheed Martin IS&GS Niskayuna, NY Frederick, MD Abstract Data Provenance is multi-dimensional metadata that specifies Information Assurance attributes like Confidentiality, Authenticity, Integrity, Non-Repudiation etc. It may also include ownership, processing details and other attributes. Further, each Information Assurance attribute may itself have sub-components like objective and subjective values or application security versus transport security. Traditionally, the Information Assurance attributes have been specified probabilistically as a belief value (or corresponding disbelief value) in that Information Assurance attribute. In this paper we introduce a framework based on Subjective Logic that incorporates uncertainty by representing values as a triple of <belief, disbelief, uncertainty>. This framework also allows us to work with conflicting Information Assurance attribute values that may arise from multiple views of an object. We also introduce a formal semantic model for specifying and reasoning over Information assurance properties in a workflow. Data Provenance information can grow substantially as the amount of information kept for each object increases as well as the complexity of a workflow increases. In such situations, it may be necessary to summarize the Data Provenance information. Further, the summarization may depend on the Information Assurance attributes as well as the type of analysis used for Data Provenance. We show how such summarization can be done and how it can be used to generate trust value in the data. We also discuss how the Information Assurance values can be visualized. . Introduction Our primary interest is in calculating the assurance in data used. One of the components used to calculate this is the Information Assurance (IA) communication attributes, which includes attributes of confidentiality, integrity, authenticity, non- repudiation, and availability. Factors that impact this include opinions of the data sources and of the certificate authorities used during the authentication process. These values are based on the observer’s viewpoint, loyalties, and knowledge, and are therefore highly subjective. For simplicity we will not address these factors in this paper. Instead, we will focus on the information assurance attributes of the communication itself, related to the communication channel and process. If all parties agree on the relative strength of cryptographic algorithms at a certain point in time, then this forms the basis for an objective and consistent measurement of information assurance values across multiple parties regarding a set of messages.