1 Moore CR, et al. BMJ Open 2021;11:e047356. doi:10.1136/bmjopen-2020-047356
Open access
Ascertaining Framingham heart failure
phenotype from inpatient electronic
health record data using natural
language processing: a multicentre
Atherosclerosis Risk in Communities
(ARIC) validation study
Carlton R Moore ,
1
Saumya Jain ,
2
Stephanie Haas,
3
Harish Yadav,
3
Eric Whitsel,
2
Wayne Rosamand,
2
Gerardo Heiss,
2
Anna M Kucharska-Newton
2
To cite: Moore CR, Jain S,
Haas S, et al. Ascertaining
Framingham heart failure
phenotype from inpatient
electronic health record
data using natural language
processing: a multicentre
Atherosclerosis Risk
in Communities (ARIC)
validation study. BMJ Open
2021;11:e047356. doi:10.1136/
bmjopen-2020-047356
► Prepublication history and
supplemental material for this
paper is available online. To
view these fles, please visit
the journal online (http://dx.doi.
org/10.1136/bmjopen-2020-
047356).
Received 28 November 2020
Accepted 05 May 2021
For numbered affliations see
end of article.
Correspondence to
Dr Carlton R Moore;
crmoore@med.unc.edu
Original research
© Author(s) (or their
employer(s)) 2021. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published by
BMJ.
ABSTRACT
Objectives Using free-text clinical notes and reports
from hospitalised patients, determine the performance
of natural language processing (NLP) ascertainment of
Framingham heart failure (HF) criteria and phenotype.
Study design A retrospective observational study design
of patients hospitalised in 2015 from four hospitals
participating in the Atherosclerosis Risk in Communities
(ARIC) study was used to determine NLP performance
in the ascertainment of Framingham HF criteria and
phenotype.
Setting Four ARIC study hospitals, each representing an
ARIC study region in the USA.
Participants A stratifed random sample of
hospitalisations identifed using a broad range of
International Classifcation of Disease, ninth revision,
diagnostic codes indicative of an HF event and occurring
during 2015 was drawn for this study. A randomly selected
set of 394 hospitalisations was used as the derivation
dataset and 406 hospitalisations was used as the
validation dataset.
Intervention Use of NLP on free-text clinical notes
and reports to ascertain Framingham HF criteria and
phenotype.
Primary and secondary outcome measures NLP
performance as measured by sensitivity, specifcity,
positive-predictive value (PPV) and agreement in
ascertainment of Framingham HF criteria and phenotype.
Manual medical record review by trained ARIC abstractors
was used as the reference standard.
Results Overall, performance of NLP ascertainment of
Framingham HF phenotype in the validation dataset was
good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity,
specifcity, PPV and agreement, respectively.
Conclusions By decreasing the need for manual
chart review, our results on the use of NLP to ascertain
Framingham HF phenotype from free-text electronic health
record data suggest that validated NLP technology holds
the potential for signifcantly improving the feasibility
and effciency of conducting large-scale epidemiologic
surveillance of HF prevalence and incidence.
INTRODUCTION
Since the passage of the Health Informa-
tion Technology for Economic and Clinical
Health Act in 2009,
1
the use of electronic
health records (EHRs) in hospital settings
has become nearly ubiquitous. Although in
2008, approximately 9% of hospitals were
using EHRs, by 2020 the adoption of EHR
use among hospitals is approaching 100%.
2
This creates unprecedented opportunities
for researchers to automate the process
Strengths and limitations of this study
► The article describes the frst study to evaluate
performance of natural language processing (NLP)
using free-text clinical notes and reports stored in
electronic health records to ascertain Framingham
heart failure phenotype in multiple regionally dis-
persed hospitals in the USA with different health
systems.
► NLP performances (sensitivity, specifcity, positive-
predictive value and agreement) are assessed with
the reference standard being manual extraction of
prespecifed information by trained and certifed ab-
stractors, using a highly standardised protocol, with
quality assurance programmes in place that mon-
itored accuracy, completeness and repeatability of
the process.
► The NLP programme used open-source software
(clinical Text Analysis Knowledge Extraction System
and Python).
► A limitation to the study is that it only includes a
subset of hospitalised patients at risk for acute
decompensated heart failure based on diagnostic
codes (International Classifcation of Disease, ninth
revision) and therefore is not representative of the
general hospitalised population.
on February 19, 2022 by guest. Protected by copyright. http://bmjopen.bmj.com/ BMJ Open: first published as 10.1136/bmjopen-2020-047356 on 14 June 2021. Downloaded from