JPrivacy: A Java Privacy Profiling Framework for Big Data Applications
Mohamed Abdellatif
Department of Computer Science
University of Miami
Coral Gables, USA
m.abdellatif@miami.edu
Iman Saleh
Department of Computer Science
University of Miami
Coral Gables, USA
iman@miami.edu
M. Brian Blake
Department of Computer Science
University of Miami
Coral Gables, USA
m.brian.blake@miami.edu
Abstract—Businesses and government agencies are
continuously generating and collecting huge amounts of data
and building related Big Data applications. Big Data
applications involve the collaborative integration of APIs from
different providers. A challenge in this domain is to guarantee
the conformance of the integration to privacy terms and
regulations. In this paper, we present JPrivacy, a privacy
profiling framework for Big Data applications. JPrivacy
proposes a model for privacy rules and provide the algorithms
and related tools to check Java code against these rules. We
show through experimentation that JPrivacy can effectively
detect privacy violations by statically analyzing a piece of code.
Keywords: Privacy, Big Data, Java, Static Analysis
I. INTRODUCTION
Given the inexpensive nature and availability of
information storage media, individuals worldwide have
exponentially increased their production and persistence of
large amounts of data whether such data are captured as text,
images, or sound. Analysis of these Big Data repositories
introduces fascinating new opportunities for discovering new
insights that contribute to different branches of science. The
potential of Big Data comes however with a price; the users’
privacy is often at risk. Guarantees of conformance to
privacy terms and regulations are limited in current Big Data
analytics and mining practices. Unlike relational databases
that exhibit a clear structure, Big Data is characterized by its
unstructured nature and the variety of data types including
both textual and audio-visual material. Only the Big Data
applications encapsulate the logic that makes sense of such
unstructured repositories. Hence, our work comes to provide
tools and frameworks to build trusted Big Data applications.
Using our framework, Big Data developers are able to verify
that their code complies with privacy agreements and that
sensitive users’ information is kept private regardless of
changes in the applications and/or privacy regulations. Our
work investigates the following research questions:
- RQ1: How to formally specify privacy? Can we devise
machine-readable privacy rules?
- RQ2: How to extract privacy rules from natural language
descriptions and formalize regulations such as the HIPAA?
- RQ3: How can we leverage the formal definition of privacy
to reason about privacy conformance for a piece of code?
- RQ4: How to automatically generate tests from formal
specification of privacy?
In this paper, we address RQ1 and RQ3. We present
JPrivacy; a privacy profiling system for Java code. JPrivacy
is based on a formal model for privacy rules and provide the
algorithms and related tools to check Java code against these
rules. Figure 1 shows the JPrivacy framework and its main
components. JPrivacy takes as input a Java application and a
natural-language description of privacy terms. It formalizes
the privacy terms and checks the application’s code for
potential violations of these terms. JPrivacy can also
leverage these terms in order to generate test cases. These
test cases guarantee that an application continues to comply
with the privacy regulations as the code and underlying Big
Data repositories evolves.
Figure 1. JPrivacy Framework
II. RELATED WORK
Research has recognized the importance of building
security and privacy measures into software systems. Most
work focuses however on the extraction of requirements
from security-related policies and regulations [1][2][3].
Other work extract formal privacy requirements from legal
regulations such as HIPPA [4][5]. The notion of privacy is
defined in terms of access right to sensitive data. Once the
requirements are defined, a traditional software engineering
process can take place. As we consider complex
collaboratively built Big Data applications, we are building
privacy consideration into the software engineering process.
III. PRIVACY MODEL
Our proposed privacy model categorizes data from a
privacy standpoint. We identify four categories of data;
Critical Sensitivity (CS), High Sensitivity (HS), Moderate
Sensitivity (MS), Low Sensitivity (LS) and Non-Sensitive
(NS). As the names suggest, these categories associate a
level of sensitivity to a piece of data. Next, we define the
different operations on data that are relevant to a privacy
checker. We identify three categories of operations that can
be done on a data field: reading, writing, and sharing. By
writing a data field, we mean having it persists in a file or a
database. A data field can be written in encrypted or
plaintext formats. On the other hand, sharing a data field is
basically sending it as a parameter to third party software.
This third party can be an API, a Web Service or simply a
code module. The third party can be a trusted or untrusted.
JPrivacy
Static Code
Analysis
Automatic
Test Case
Generation
Our Privacy Policy explains:
What informa8on we
collect and why we collect
it.
How we use that
informa8on.
The choices we offer,
including how to access and
update informa8on.
We’v terms like cookies, IP
addresses, pixel tags and
browsers, then read about
Privacy Terms
Big Data
Application
Machine-Readable
Privacy Rules
Tests
Privacy
Formalization
Code Parser
Syntax Tree
RQ2
RQ1
RQ3
RQ4
>WARNING
>ERROR
Privacy
Violations
COLLABORATECOM 2014, October 22-25, Miami, United States
Copyright © 2014 ICST
DOI 10.4108/icst.collaboratecom.2014.257666