An automatic electronic nursing records analysis system based on the text
classification and machine learning
Zhang Wei, Zheng Xian Ju, Xie Chun
Dept. of Computer Engineering
Chengdu Technological University
Chengdu, China
zhangwei0317@gmail.com
Jiang Hua, Peng Jin
Metabolomics and Multidisciplinary Laboratory
Institute for Emergency and Disaster Medicine
Sichuan Provincial People’s Hospital
Sichuan Academy of Medical Sciences
Chengdu, China
cdjianghua@gmail.com
Abstract—Enormous amount of unstructured electronic health
record are invaluable for the medical research in finding the
relationship between the patient's disease and the final
diagnosis. How to use computer automatically dig up these
information has long been a hot spot. To get the relationship
between the clinical outcomes and free text writing by nurse,
we developed an automatic categorization system process
natural language nursing record based on vector space model.
210 cases of electronic nursing records, which were diagnosed
as pancreatitis, were induced in this study. We filtered the
restricted corpus for acute pancreatitis classification by
information gain (information gain. IG), and construct a text
classification system based on Partial least squares
discrimination algorithm (PLS-DA) and vector support
machine (VSM). PLS loading value analysis found that there
are 20 terms can be used to classify medical record text. Our
innovative machine-learning algorithm effectively classified
free texts of nurse care records associated with normal and
acute pancreatitis diagnoses, after training on pre-classified
test sets by PLS. This automatic identification technology focus
in large-scale medical document may provide important clues
to study the acute pancreatitis and other important common
disease.
Keywords-Text classification; Partial Least Squares; Vector support
machine; Information gain; Pancreatitis
I. INTRODUCTION
1
With the rapid development and popularity of the
electronic medical record (EMR) technology, a sharp
increase of medical record text messages was stored in a
readable form by computer. How to automatic classification,
organization and management the voluminous literature,
information and data (most of it is the text) has become an
important topic for the research in text mining and machine
learning. Nursing record include a lot of analysis and
description of original information from patients. Combined
with physical examination and various clinical laboratory
test results, the medical record of the original information
can often reflect the patient's condition changes in overall
situation and display a high correlation with the final clinical
diagnosis. Until now, no one has been able to develop
Supported by Research Foundation of Chengdu Technological University
No. KY1211009B and Sichuan Provincial Education Board No.13ZA0047
software that can automatically identify and understand
nursing medical record text. However the development of the
software and algorithm for a particular disease process plays
an important role in the cognition to the disease [1].
How to classification the severe acute pancreatitis from
the normal acute pancreatitis in clinical diagnosis has been
the important challenge to a clinician. For clinicians who
faced with a patient with abdominal pain, vomiting, fever,
hematuresis and amylase rise, which mean he/she could be
diagnosed with acute pancreatitis, but a series of imaging
examination including ultrasound is key to get the final
diagnosis. There has long been a lack of a quantitative
research in mining the relationship between the final
diagnosis and the early clinical diagnosis. Electronic Nursing
records summary is clinical summary recorded by nurse after
the first time of nursing ward round, which containing the
unbiased digital text of summarized clinical observation.
Information contain in this sort of record is likely to be
closely related to the final outcome and prognosis of patients.
There are generous of nursing record written in Chinese
characters, to construct the proper classifier, we need to
transfer the large amount of Chinese writing unstructured
nursing document into structured documents through auto
segmentation by computer. At present, there are two ways
transfer the unstructured documents into structured
documents, one is Knowledge Engineering (KE) and another
is Machine learning (ML) for extraction the key
definition[2]. On the extraction in relatively smaller amount
of data, the KE technology s extraction effect is better. For
unknown large number of the records, the usage of machine
learning has more advantages. On one hand, nursing medical
record system is composed by a large number of professional
terms to describe the short text; on the other hand, there is no
document show that it can describe the patient s condition
with specific disease accurately through a small number of
professional terms. In order to gain the most important
professional term from a large number of descriptive
sentences, the machine learning is very important method to
extract medical record information.
Based on HowNet knowledge base to conversion and
structuring the nursing records, We performed the pattern
recognition in using the partial least squares and support
vector machine (SVM) to find out a way to get specific
2013 Fifth International Conference on Intelligent Human-Machine Systems and Cybernetics
978-0-7695-5011-4/13 $26.00 © 2013 IEEE
DOI 10.1109/IHMSC.2013.265
492
2013 Fifth International Conference on Intelligent Human-Machine Systems and Cybernetics
978-0-7695-5011-4/13 $26.00 © 2013 IEEE
DOI 10.1109/IHMSC.2013.265
494