A New Framework for Distilling Higher Quality information from Health Data via Social Network Analysis M. Baglioni , S. Pieroni , F. Geraci , F. Mariani , S. Molinaro , M. Pellegrini and E. Lastres Istituto di Informatica e Telematica, CNR, Pisa Email:miriam.baglioni@iit.cnr.it, filippo.geraci@iit.cnr.it, marco.pellegrini@iit.cnr.it Istituto di Fisiologia Clinica, CNR, Pisa Email: s.pieroni@ifc.cnr.it, marifa@ifc.cnr.it, molinaro@ifc.cnr.it Sistemi territoriali s.r.l, Navacchio Email: e.lastres@sister.it Abstract—Personalized medicine as well as systems biology poses the challenge of developing new models to connect health data coming from many different flows and extract from them new information to support clinicians in their therapeutic activity. In this scenario we developed a novel framework, tailored to clinicians needs, which exploits the strength of the social network model to provide a representation of the health care system as a whole. In this paper we also propose a data analysis approach inspired to the humans’ cognitive process where the awareness of a phenomenon is the result of an exploration step in which situations of possible interest are identified, and a subsequent in-depth examination step in which the phenomenon is characterized. Experiments have shown that our framework is able to provide effective answers to complex enquiries submitted by clinicians for which standard statistical methods fail. I. I NTRODUCTION The holistic approach to the treatment of patients suggested by personalized medicine has become a standard practice among clinicians. To be effectively implemented, personalized medicine requires to collect heterogeneous information from many sources and organize them as a whole data model. In particular it is essential the availability in the form of electronic records of the documents generated during all the interac- tions among the patient and the health care infrastructure. This trail of documents forms the so called flow of health data which includes discharge letters from hospitals, drug prescriptions, specialist health-care, death records. All these documents together allow: drawing a comprehensive picture of the health state of a patient, tracing hers/his pathological history, and evaluating the overall performance of the health care infrastructure. In this scenario, social networks (often referred as complex networks in health care and in systems biology [1], [2]) can represent a convenient framework to deal with data coming from different streams and highlight the relationships among them, because they allow to represent different types of sub- jects and their relationships in the same network, thus matching the goal of providing a representation of the health care system as a whole. Among the other characteristics, the strength of social net- works is that they are based on a solid theoretical background derived from graph theory. As a result, the social network analysis has taken advantage from this background to design powerful tools able to provide a deeper understanding of many emergent global phenomena. The most natural way to represent the health care infras- tructure as a social network is that in which we have a class of nodes for each type of subject involved in the flow of health data (i.e. patients, clinicians, pathologies). The semantic behind the relationships depends on the type of connected nodes. The network model allows relationships among both pairs of nodes of different types and pairs of nodes of the same type. For example a patient can be connected with a doctor if the latter has visited her/him or two drugs can be related if they fall in the same pharmaceutical class. In our framework we adopted the above representation of the health care infrastructure since it has shown to be intuitive for the clinicians who are not required to learn a new data model. We also propose an analysis approach inspired to the humans’ cognitive process where the awareness of a phenomenon is the result of the exploration of the world, the identification of phenomena of possible interest, and the in- depth examination of these phenomena. To do so, we designed analysis and visualization algorithms aimed at guiding the clinician in an ideal path in which: she/he can explore and visualize (portions of) the social network, identify structures derived from phenomena of possible interest which details are not known a priori, and find all the instances of an interest- ing structure to perform an in-depth examination of it. For example, consider the situation in which we are interested to identify if there exist patients who share the same pathological path. Even if we do not have a-priori information about the patients and the pathologies involved, our framework is able to recognize and enumerate all these situations which can be further investigated by the human expert. According to some configurable selection criteria, our visualization algorithms allow to draw a portion of the social network and present it to the user using the most convenient layout. We implemented three main layout methods: 1) the force-based layout in which all the nodes are arranged around a pivot node; 2) the tree layout in which the graph is mapped on a routed tree; and 3) the circular layout in which the nodes 2013 IEEE 13th International Conference on Data Mining Workshops 978-0-7695-5109-8/13 $31.00 © 2013 IEEE DOI 10.1109/ICDMW.2013.142 48 2013 IEEE 13th International Conference on Data Mining Workshops 978-0-7695-5109-8/13 $31.00 © 2013 IEEE DOI 10.1109/ICDMW.2013.142 48