Intentionally-Linked Entities: a database system for health care informatics ——— (Accepted at BioComp ’14; unrevised version) Vitit Kantabutra *†‡ kantviti@isu.edu, vkantabu@computer.org * Dept. Informatics and Computer Science, Idaho State University, Pocatello, Idaho 83209, U.S.A. † Dept. Electrical Engineering, Idaho State University, Pocatello, Idaho 83209, U.S.A. ‡ Supported by the U.S. National Science Foundation, award no. NSF-0941371 J.B. Owens, V. Kantabutra, D.P. Ames, R. Jones, investigators. Keywords—ILE, EHR, epidemiology, health care, analytics, health care administration, linked data Abstract—This paper introduces the Intentionally-Linked En- tities or ILE database system and the possibility of using ILE to more efficiently and accurately manage data in health care information systems. ILE links together data entities in a more robust and efficient way than the Relational database system. ILE also keeps data in a more organized fashion than the Relational or the graph database system, and is capable of expressing more general relationships among data entities than the Object- Oriented database system. All these positive qualities of ILE present the possibility of improving the reliability and correctness of health care databases, and may lead to improved and more efficient patient care and health care database analysis. I. I NTRODUCTION This paper introduces a database system called Intentionally-Linked Entities (ILE), which links together data entities in a more robust and efficient way than the Relational database system. ILE also keeps data in a more organized fashion than the Relational system, and is more efficient at searches. These positive qualities of ILE present the possibility of improving the reliability and correctness of health care databases, and may therefore lead to improved and more efficient patient care and database analysis. By storing information more accurately and in a more organized fashion than the Relational database system, ILE may help health care providers avoid mistakes that can compromise patient health, patient confidentiality, or other aspects of good quality patient care. A major motivation for developing the ILE database system is the importance of implementing data linking in a better, more robust way. The importance of data linking is eloquently described in Trotter and Uhlman [2013, page 55]: Data linking is all about the way data in one part of a patient’s record relates to data in another part of the record. When data linking fails, the data in an EHR for a patient is at war with itself. The simplest way to ensure that data is well-linked is to try and ensure that data is always linked correctly, .... The authors also pointed out that when the information from a health care database is used for making dangerous decisions such as drug dosing and administration, then even a seemingly minuscule error rate such as 0.02% may mean several tragic errors because of the volume of cases handled in a large medical facility. Patients with the same or similar names are routinely confused in health care facilities. Such confusion may lead to serious health risks such as improperly handling drug allergies, inappropriate medicine or medical/surgical pro- cedures, a compromise of patient confidentiality, or inefficient or inaccurate health care database analysis. This is why linking errors have to be eliminated to the greatest extent possible. The Relational database system, first developed by Codd in the early 1970’s, remains the most popular type of database system for health care informatics today [Wager et al., 2005]. Despite its name, a significant problem with the Relational database system is relationship linkage, which refers to the physical linking of data entities that are supposed to be related to each other via a relationship. More specifically, the Relational database system determines whether two data entries are the same by comparing data field values, values that are entered separately by the users, often without a stringent check to make sure that entries that supposed to match actually match each other. The problem with that is that any misspellings, including the inclusion of blanks or invisible control characters, can cause an absence of linkage. Less likely but possible is the situation where two entries are inadvertently spelled the same and therefore linked when they shouldn’t be. Another problem with the Relational database system is that it is not a natural means of modeling complex data. Researchers and practitioners have noticed this fact since the 1980’s, including in CAD [Stajano, 1998] and in electronic medical records [Speckauskiene and Lukosevicius, 2008]. As stated correctly in Kalet [2014, page 134], the Relational database model is appropriate for use when data logically match the idea of many identically structured records with relatively simple structure, or a collection of such structures. The Relational database system is also well known to suffer