1 Moore CR, et al. BMJ Open 2021;11:e047356. doi:10.1136/bmjopen-2020-047356 Open access Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study Carlton R Moore , 1 Saumya Jain , 2 Stephanie Haas, 3 Harish Yadav, 3 Eric Whitsel, 2 Wayne Rosamand, 2 Gerardo Heiss, 2 Anna M Kucharska-Newton 2 To cite: Moore CR, Jain S, Haas S, et al. Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study. BMJ Open 2021;11:e047356. doi:10.1136/ bmjopen-2020-047356 Prepublication history and supplemental material for this paper is available online. To view these fles, please visit the journal online (http://dx.doi. org/10.1136/bmjopen-2020- 047356). Received 28 November 2020 Accepted 05 May 2021 For numbered affliations see end of article. Correspondence to Dr Carlton R Moore; crmoore@med.unc.edu Original research © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. ABSTRACT Objectives Using free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype. Study design A retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype. Setting Four ARIC study hospitals, each representing an ARIC study region in the USA. Participants A stratifed random sample of hospitalisations identifed using a broad range of International Classifcation of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset. Intervention Use of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype. Primary and secondary outcome measures NLP performance as measured by sensitivity, specifcity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard. Results Overall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specifcity, PPV and agreement, respectively. Conclusions By decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for signifcantly improving the feasibility and effciency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence. INTRODUCTION Since the passage of the Health Informa- tion Technology for Economic and Clinical Health Act in 2009, 1 the use of electronic health records (EHRs) in hospital settings has become nearly ubiquitous. Although in 2008, approximately 9% of hospitals were using EHRs, by 2020 the adoption of EHR use among hospitals is approaching 100%. 2 This creates unprecedented opportunities for researchers to automate the process Strengths and limitations of this study The article describes the frst study to evaluate performance of natural language processing (NLP) using free-text clinical notes and reports stored in electronic health records to ascertain Framingham heart failure phenotype in multiple regionally dis- persed hospitals in the USA with different health systems. NLP performances (sensitivity, specifcity, positive- predictive value and agreement) are assessed with the reference standard being manual extraction of prespecifed information by trained and certifed ab- stractors, using a highly standardised protocol, with quality assurance programmes in place that mon- itored accuracy, completeness and repeatability of the process. The NLP programme used open-source software (clinical Text Analysis Knowledge Extraction System and Python). A limitation to the study is that it only includes a subset of hospitalised patients at risk for acute decompensated heart failure based on diagnostic codes (International Classifcation of Disease, ninth revision) and therefore is not representative of the general hospitalised population. on February 19, 2022 by guest. Protected by copyright. http://bmjopen.bmj.com/ BMJ Open: first published as 10.1136/bmjopen-2020-047356 on 14 June 2021. Downloaded from