Extracting Body Function from Clinical Text Guy Divita 1 , Jessica Lo 1 , Chunxiao Zhou 1 , Kathleen Coale 1 and Elizabeth Rasch 1 1 Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA Abstract This paper describes finding Body Function (BF) mentions within clinical text. Body Function is noted in clinical documents to provide information on potential pathologies within underlying body systems or structures. BF mentions are embedded in highly formatted structures where the formats include implied scoping boundaries that confound existing NLP segmentation and document decomposition techniques. We have created two extraction systems: a dictionary lookup rule-based version, and a conditional random field (CRF) approach based on training from manual annotations. Training and test data utilized the NIH Clinical Center Rehabilitation Medicine Department records. Results of these systems provide a baseline for future work to improve document decomposition techniques. Keywords 1 Natural Language Processing, Body Function, ICF 1. Introduction Body functions are the physiological or psychological functions of body systems[1]. Body functions are mentioned in clinical text when there is concern for or documentation of pathologies around body function or body function assessment. Body Function information is commonly collected during physical exams to provide information on potential pathologies within underlying body systems or structures. Our motivation came from a request from the Social Security Administration to retrieve BF mentions within their documents to support existing efforts to enhance their disability claims adjudication process. While there is a question around the utility of body function information as it relates to disability adjudications, we are motivated to work on this task as a mechanism to improve the algorithms that support BF extraction, namely sectionizing, sentence chunking, and context scoping annotators using BF mentions as the use case. BF mentions are often embedded in complex formatted text in the form of lists, slot-values, and oddly punctuated sentences in clinical notes. This paper reports on the systems developed to capture this information before making improvements to the document decomposition tasks. Our conceptual framework for BF comes from the International Classification of Functioning, Disability and Health(ICF) [2]. While there are many specific kinds of body function, we set out to find mentions of strength, range of motion (ROM), and reflexes because of their relevance to the current disability adjudication business process. Within these mentions, we label the body function type (strength, range of motion, reflex), the body location, and any associated qualifiers. 2. Prior Work There is little prior work specifically extracting body function from clinical notes. Some work has been done extracting other ICF defined areas using traditional rule-based techniques as well as deep learning methods. Kukafka, Bales, Burkhardt and Friedman report on modifying MedLEE to automatically identify five ICF codes from Rehab Discharge summaries[3]. Newman-Griffis and Fosler-Lussier describe linking physical activity reports to ICF codes using more recent language models and embeddings[4]. The NLP platform employed for this work was adapted from the V3NLP Framework[5] and Sophia[6] which were used for symptom extraction and finding mentions of sexual trauma in veteran 22 Use of this content permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).