MongoDB: An open source alternative for HL7-CDA clinical documents management Gómez Adrián, García Eijó Francisco, Martínez Marcela, Analia Baum, Luna Daniel, González Bernaldo de Quirós Fernán Health Informatics Department – Hospital Italiano de Buenos Aires – Argentina Abstract One of the main needs of health care systems, are related to the ability to manage and process large volumes of data stored in heterogeneous clinical repositories. This issue is a topic of interest in the field of health informatics. This paper describes a new alternative for the clinical information retrieval, stored in HL7 Clinical Document Architecture repository, using a document database, NoSQL, schema free, in a laboratory environment. Keywords Clinical Information Retrieval, Interoperability, Clinical Document Architecture, Electronic Health Record. Introduction The evolution of science applied to the management of information and communication has created new data storage technologies in heterogeneous and unstructured formats, with the capability of management of complex data structures. These technologies related to the current needs of health information systems, where patient’s clinical record, has multimedia information, which is usually composed of contextual data, analytical data and digital images [1]. The communication and information transfer needs are also represented in this area, where different new standards for clinical documents exchange, have brought new challenges to efficient storage and retrieval of information [2]. From these standards, we highlight the HL7 organization, as the most important international organization in this topic, with the goal of establishing a general framework for medical information exchange [3][4]. Clinical Document Architecture (CDA) is an XML-based document tags standard, derived from a reference model (HL7 RIM) which allows you to specify the structure and semantics of a clinical document, in order to facilitate the exchange of information, document management and integration of data that support patient care [5], [6]. CDA infrastructure is widely used by health organizations, creating large repositories of relevant clinical information, stored in a standard format. The Italian Hospital clinical information system, is composed of an electronic medical record, implemented in the outpatient, inpatient, emergency and home care, consisting of a transactional data repository and a clinical data repository based on CDA, in a network health services provider with 50,000 admissions / year and 2.5 million outpatient visits / year [7]. This scenario presents the issues related to efficient management and retrieval of large data volumes[8][9]. There are several publications where the study problem focuses on the storage of XML data, presenting different complexities in managing heterogeneous data, comparatively others analyze XML file processing versus object- oriented repository or native XML databases, presenting favorable results in those technologies that use databases, introducing themselves as the best choice for processing large volumes of information. [10], [11], [12] Based on these studies, we performed a proof of concept, using NoSQL databases, Schema Free, as an alternative tool for the management and processing of large volumes of heterogeneous clinical information. Legacy systems mounted on transactions technologies are not included in the object of study, we focused on the analysis of efficient retrieval of clinical information contained in CDA documents Materials and Methods The Italian Hospital clinical data repository is based on the standard HL7-CDA and has a little more than 22 million documents. From this repository, a sample of 1,000,000 random clinical documents was obtained, defining a