Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms Matthias Blohm 1 , Claudia Dukino 2 , Maximilien Kintz 2 , Monika Kochanowski 2 , Falko Koetter 2 and Thomas Renner 2 1 University of Stuttgart IAT, Institute of Human Factors and Technology Management, Germany 2 Fraunhofer IAO, Fraunhofer Institute for Industrial Engineering IAO, Germany monika.kochanowski@iao.fraunhofer.de, falko.koetter@iao.fraunhofer.de, thomas.renner@iao.fraunhofer.de Keywords: Natural Language Processing, Artificial Intelligence, Cloud Platform, GDPR, Compliance, Anonymization. Abstract: Natural language processing in combination with advances in artificial intelligence is on the rise. However, compliance constraints while handling personal data in many types of documents hinder various application scenarios. We describe the challenges of working with personal and particularly sensitive data in practice with three different use cases. We present the anonymization bootstrap challenge in creating a prototype in a cloud environment. Finally, we outline an architecture for privacy compliant AI cloud applications and an anonymization tool. With these preliminary results, we describe future work in bridging privacy and AI. 1 INTRODUCTION Natural language processing (NLP) is on its rise. Researchers all over the scientific landscape investi- gate manifold real world applications. However, in these application scenarios the General Data Protec- tion Regulation (European Union, 2016) is conceived as a major challenge in NLP. This is, because in con- trast to tabular data, anonymization by aggregation is not possible for natural language text, as shown in Figure 1. Furthermore, pseudonymization methods can cause information loss. These issues are all the more crucial when cloud- based solutions are considered. In order to make automated text analysis widely available, to share knowledge across stakeholders and to reduce tag- ging workload, cloud-based text analysis platforms are a promising solution. However, working with GDPR-relevant data in the cloud is particularly dif- ficult. Thus, the need for ways of taking advantages of cloud solutions while remaining GDPR-compliant increases. A solution for automatically dealing with GDPR relevant data especially in natural language docu- ments is often missing. Therefore, anonymization and pseudonymization is done manually. A promis- ing idea is to use artificial intelligence (AI) / ma- chine learning (ML) for anonymizing natural lan- guage documents - however, to train this artificial intelligence, non-anonymized and anonymized docu- ments are needed. To get around this problem, several options are possible. This paper is structured as follows. Section 2 de- scribes related work on the topics of natural language processing, anonymization and pseudonymization as well as platforms. Section 3 describes three exist- ing application scenarios - court decisions, healthcare and insurance fraud. Based on these application sce- narios, a central research question is derived in Sec- tion 4. To answer this question, section 5 outlines a solution architecture for GDPR-compliant, semi- automated document anonymization as well as an in- progress prototype. Finally, Section 6 summarizes the work and gives an outlook on research-in-progress. 2 RELATED WORK We describe related work in three areas: (1) NLP in GDPR context and (2) anonymization and pseudonymization by artificial intelligence as well as (3) platform solutions for NLP. (1) Currently the possible slowdown of Europe’s innovation progress especially in the field of Text and Data Mining (TDM) due to restrictive laws of data protection and privacy is an important issue in pub- lic discussions (European Comission, 2014). Since 454 Blohm, M., Dukino, C., Kintz, M., Kochanowski, M., Koetter, F. and Renner, T. Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms. DOI: 10.5220/0007746204540461 In Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), pages 454-461 ISBN: 978-989-758-372-8 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved