Clinical Decision Support for Stroke using Statistical Models for NIHSS Scores Vaibhav Rajan, Sakyajit Bhattacharya Xerox Research Centre India Bangalore, India Email: {vaibhav.rajan, sakyajit.bhattacharya}@xerox.com Ranjan Shetty, Amith Sitaram and G. Vivek Kasturba Medical College, Manipal University Manipal, India Abstract—Cerebral stroke is a leading cause of physical disability and death in the world. The severity of a stroke is determined by a neurological examination and is quantified using a scale known as the NIH stroke scale (NIHSS). As a measure of stroke severity, the NIHSS score is widely adopted and several studies have shown the efficacy of using the score in a variety of contexts, for example, in outcome prediction, rehabilitation planning and treatment planning. In this paper we propose a novel statistical model for predicting stroke severity. While being of scientific value, since it reveals factors affecting stroke severity, such a model also has many practical applications in clinical decision support systems. We also illustrate the use of copula based modeling for data with complex feature dependencies, as is commonly found in healthcare data. We design a vine copula based Bayesian classifier for predicting NIHSS scores which outperforms several current classification techniques. I. I NTRODUCTION Stroke is the second leading cause of death and a major cause of neurological disability in the world [1]. Annually, 15 million people worldwide suffer a stroke. Of these, 5 million die and another 5 million are left permanently disabled [2]. Stroke is not limited to western or high-income countries only: about 85% of all stroke deaths are registered in low– and middle–income countries. After a stroke occurs, one of the first steps undertaken by a neurologist is to ascertain the severity of the stroke and its effects on the patient. The National Institute of Health (NIH) Stroke Scale (NIHSS) is a standardized scale used to assess and quantify the level of neurological impairment of a stroke patient. The NIHSS score is widely used and is a consistent measure of stroke severity. It has been extensively studied and found to be an effective predictor of patient outcome and hospital disposition. It is also used in treatment and dosage planning for stroke patients. In this paper we propose a new model for stroke severity through a model for NIHSS scores. The model uses easily usable predictors such as blood investigation results, past conditions of the patient, medications being taken and demo- graphic factors such as age, gender, socio–economic status etc. To the best of our knowledge, a study of how these factors affect stroke severity has not been conducted and no statistical model for stroke severity exists today. While being of scientific merit in itself, since it reveals risk factors that distinguish a mild stroke from a severe stroke, the model has several practical applications in treatment planning, hospital resource management and automated clinical decision support systems in general. A variety of potential applications of the model is detailed in the next section. The second contribution of the paper is an illustration of copula based modeling for healthcare data. Copulas are functions that allow a multivariate distribution to be expressed in terms of their constituent univariate marginals. Thus it allows us to model the marginal distributions independently from the dependence structure which can be linear as well as non-linear. Copulas have been used extensively in finance [3] but its use in bio-medical data is scarce. Vine copulas provide a flexible and elegant way of combining multiple copulas to form a high dimensional model. In healthcare, copula based models are particularly useful due to the complex intrinsic dependencies prevalent in most datasets. While there are several expositions on copulas providing theoretical details, almost none can guide the practitioner in using vine copula based models for their data. We hope to fill this gap in this paper. The rest of the paper is organized as follows. We begin with a brief background on stroke and outline the uses of NIHSS scores in section II. Section III describes the potential uses of our new model for stroke severity. Section IV describes the data used in the study. In section V we perform standard univariate analysis of the factors affecting stroke severity. This is followed by a brief description of copulas and vines in section VI which outlines how copulas can be used to model multivariate data. A vine copula based model for stroke severity is described which is used in the following section to build a predictive classifier. We conclude in section VIII. II. BACKGROUND A stroke happens when blood flow to a part of the brain stops. If blood flow is stopped for longer than a few seconds, the brain cannot get blood and oxygen. Brain cells can die, causing permanent damage. There are two major types of stroke: ischemic stroke and hemorrhagic stroke. Ischemic stroke occurs when a blood vessel that supplies blood to the brain is blocked by a blood clot. This may happen in two ways: (i) A clot may form in an already narrow artery, called a thrombotic stroke. (ii) A clot may break off from another place in the blood vessels, and travel to the brain, called cerebral embolism, or an embolic stroke. A hemorrhagic stroke occurs when a blood vessel in part of the brain becomes weak and ruptures, causing blood to leak into the brain.