853 Decomposing the Twitter Data Stream in Healthcare: An Information Theory Perspective Yuan Zhang University of North Texas, USA. yz0209@unt.edu Hsia-Ching Chang University of North Texas, USA. Hsia-Ching.Chang@unt.edu ABSTRACT Recent research using Twitter as an information com- munication channel has shown how event organizers convey and disseminate their agendas across indus- tries and disciplines. However, little research has been carried out on the user’s choice of information com- ponents when composing a tweet through the lens of information theory. This research employs a compar- ative case study to examine how medical-terminology hashtags and corresponding lay-language hashtags have been used to help to communicate healthcare messages on the Twitter platform. The main result of this case study revealed patterns that both retweeting behavior and the use of a variety of components to construct a tweet contribute to higher entropy values which imply that these are more informative ways to communicate healthcare messages. KEYWORDS Twitter, tweets, hashtag, information theory, healthcare INTRODUCTION Composing a tweet on the Twitter platform involves a choice of combining different components, such as photos, video clips, the @username “mention” function, hashtags, hyper- links and up to 140 characters of text. An orchestrated presen- tation of various information components usually improves the usability, effectiveness and perceived quality of a cam- paign message. The focus of information theory (Shannon, 1948) is not on the meaning of the message but on the struc- ture of the message. Entropy, the essence of information the- ory, provides researchers with a measurement to examine the variety of combinations of the different content components that form tweets. Through the lens of information theory, this study focuses on understanding healthcare communication on Twitter by examining the structure of messages in a sample of tweets associated with healthcare-related hashtags. RELATED WORKS Developed by Shannon (1948), entropy in information theory was defined as the amount of information which was calcu- lated by the logarithm of the effective number of microstates of a closed system or the effective number of possible values of a random variable. Primarily adopted in engineering and computer science (Miller, 1953; Hayes, 1993; Kinsner, 2004), information theory has also been applied to linguistic studies. With the prevalence of social media, research inter- ests have shifted to linguistic studies of the Twitter platform using information theory. Neubig and Duh (2013) examined information content per character in a tweet. Ghosh, Suracha- wala and Lerman (2011) introduced an entropy-based ap- proach to characterizing the dynamics of retweeting behav- ior. To the best of our knowledge, no study has used the lens of information theory to investigate the tweet composition of different components (text, hashtag, hyperlink, image, etc.) from which users can choose and construct their tweets. METHODOLOGY This study applies information theory to compare the use of medical-terminology hashtags and lay-language healthcare related hashtags, namely #glucose versus #bloodsugar and #hypertension versus #bloodpressure, with a comparison of their statistical structures regarding the choice of components to compose a tweet. Research Design: Using a Case Study Method to Compare the Components of Tweets in Hashtag Trails Using entropy as a measurement, this study analyzes two pairs of healthcare hashtag trails with a focus on six main components used in the tweets. The six components are (1) image(s), (2) text with semantic meaning, (3) hashtag(s), (4) @username(s), (5) hyperlink, and (6) unused space. Data Collection and Analysis with Entropy Calculation Data for the comparisons of two cases, namely #glucose ver- sus #bloodsugar and the one with #hypertension versus #bloodpressure, were collected by the software application NodeXL professional version. Table 1 shows the summary of data preparation for two groups of paired hashtag-trails. In- spired by the work of Kearns and O’Connor (2004), this study draws on their approach of calculating “form complexity” in moving image documents. Therefore, this study not only ex- amines the complexity of the “statistical structure” (Shannon, 1948) in a hashtag trail but also extends Shannon’s original entropy equation to a multi-dimensional matrix by integrat- ing six different content components from their own coding scheme. Table 1 illustrates each coding scheme and the ma- trix for calculating entropy for each component and for each tweet. 80 th Annual Meeting of the Association for Information Science & Technology, Washington, DC | Oct. 27-Nov. 1, 2017 Authors Retain Copyright