7th Annual Conference and Exhibition Developing SNOMED CT Subsets from Clinical Notes for Intensive Care Service Jon Patrick , Yefeng Wang, Peter Budd School of Information Technologies, University of Sydney, Sydney, Australia {jonpat, ywang1, pbudd}@it.usyd.edu.au Alan Rector, Sebastian Brandt, Jeremy Rogers Department of Computer Science, University of Manchester, Manchester, UK arector@cs.man.ac.uk , brandt@cs.manchester.ac.uk , jeremy.rogers@nhs.net Robert Herkes, Angela Ryan Intensive Care Service, Royal Prince Alfred Hospital, Sydney, NSW, Australia roberth@mail.usyd.edu.au , angela@cs.usyd.edu.au Bahram Vazirnezhad Department of Biomedical Engineering, Amirkabir University of Technology, Iran bahram@it.usyd.edu.au Abstract This paper describes the development of a SNOMED CT subset derived from clinical notes. A corpus of 44 million words of patient progress notes was drawn from the clinical information system of the Intensive Care Service (ICS) at the Royal Prince Alfred Hospital, Sydney, Australia. This corpus was processed by a variety of natural language processing procedures including the computation of all SNOMED CT candidate codes. There are about 13 million concept instances comprising about 30,000 unique concept types detected in the corpus. These instances have been processed by a tool which computes the closure of the minimal sub-tree of concept types in the SNOMED hierarchy thus inferring the complete subset of SNOMED CT that would be necessary for an intensive care unit. A subset of about 2700 concepts gives a coverage of 96% of the corpus and the transitive closure uses less than 1% of SNOMED concepts and relationships. Use of this subset will enable clinical information systems to efficiently deliver SNOMED CT terminology to the presentation interface. 1 Objectives This study uses the contents of the clinical notes collected from an ICU's clinical information system, CareVue Classic (Philips Medical Systems, Andover, MA) to compute a suitable ICU subset. The notes were originally extracted, anonymised and analysed in a variety of ways for two objectives: to identify linguistic characteristics relevant to successful automatic processing of the narratives, and to understand the everyday use of written (typed) clinical language and assess where it might be improved. Subsequently it became apparent that the narratives contain the information needed to define the language of the ICU and hence define the concepts that needs to be expressed in an ICU clinical dialect. From this study it appears feasible to use the clinical notes to construct an ICU subset of SNOMED CT (SCT) as an alternative method to collecting a subset in a Delphi process using expert intensivists.