EJBI – Volume 16 (2020), Issue 1 15 Research A Detecon of Informal Abbreviaons from Free Text Medical Notes Using Deep Learning Lukman Heryawan 1 , Osamu Sugiyama 2 ,Goshiro Yamamoto 3 , Purnomo Husnul Khomah 4 , Luciano H. O. Santos 1,2,3 , Kazuya Okamoto 1,2,3 , Tomohiro Kuroda 1,2,3 1 Graduate School of Informacs, Kyoto University, Japan 2 Graduate School of Medicine, Kyoto University, Japan 3 Kyoto University Hospital, Japan 4 Research Center for Informacs, Indonesian Instute of Sciences, Indonesia Citaon: Heryawan L, et. al. (2020). A Detecon of Informal Abbreviaons from Free Text Medical Notes Using Deep Learning. EJBI. 16(1): 29-37 DOI: 10.24105/ejbi.2020.16.1.29 Received: May 15, 2020 Accepted: May 26, 2020 Published: June 02, 2020 Correspondence to: Dr. Lukman Heryawan Department of Social Informacs, Graduate School of Informacs, Kyoto University Hospital, 54 Kawahara, Shogoin, Sakyo, Kyoto, Japan E-mail: lukman@kuhp.kyoto-u.ac.jp 1. Introducon Medical data need to be structured to achieve semantic interoperability. Semantic interoperability is essential for Electronic Medical Records (EMR) since they must serve as a seamless communication platform, allowing data to be compatible whenever a patient migrates from one physician to another [1,2]. Semantic interoperability ensures that the meaning of medical concepts can be shared across systems, thus providing a digital and common language for medical terms that is understandable to humans and machines. For instance, the sentence “patient g2-p2 experiences asthma attack” includes information about pregnancy history, in the form of the abbreviated term “g2-p2”. If the same sentence were written using a standard for semantic interoperability, such as SNOMED CT [3], the abbreviation “g2-p2” would be replaced by “gravida 2 or second pregnancy and para 2 or parity 2”. In the abbreviated sentence, if the term “g2-p2” were not detected by a healthcare information system that employs SNOMED CT, such as a Clinical Physician Order Entry (CPOE), the person responsible for processing the medication order might misunderstand or fail to recognize it, leading to an erroneous Abstract Background: To parse free text medical notes into structured data such as disease names, drugs, procedures, and other important medical information first, it is necessary to detect medical entities. It is important for an Electronic Medical Record (EMR) to have structured data with semantic interoperability to serve as a seamless communication platform whenever a patient migrates from one physician to another. However, in free text notes, medical entities are often expressed using informal abbreviations. An informal abbreviation is a non-standard or undetermined abbreviation, made in diverse writing styles, which may burden the semantic interoperability between EMR systems. Therefore, a detection of informal abbreviations is required to tackle this issue. Objectives: We attempt to achieve highly reliable detection of informal abbreviations made in diverse writing styles. Methods: In this study, we apply the Long Short- Term Memory (LSTM) model to detect informal abbreviations in free text medical notes. Additionally, we use sliding windows to tackle the limited data issue and sample generator for the imbalance class issue, while introducing additional pre-trained features (bag of words and word2vec vectors) to the model. Results: The LSTM model was able to detect informal abbreviations with precision of 93.6%, recall of 57.6%, and F1-score of 68.9%. Conclusion: Our method was able to recognize informal abbreviations using small data set with high precision. The detection can be used to recognize informal abbreviations in real-time while the physician is typing it and raise appropriate indicators for the informal abbreviation meaning confirmation, thus increase the semantic interoperability. Keywords Informal abbreviations; LSTM; Structured data; Free text medical notes; EMR