On Curating Responsible and Representative Healthcare Video Recommendations for Patient Education and Health Literacy: An Augmented Intelligence Approach Krishna Pothugunta Accounting and Information Systems Michigan State University East Lansing, MI, USA pothugun@msu.edu Xiao Liu Information Systems Arizona State University Tempe, AZ, USA xiao.liu.10@asu.edu Anjana Susarla Accounting and Information Systems Michigan State University East Lansing, MI, USA asusarla@msu.edu Rema Padman Information Systems and Public Policy Carnegie Mellon University Pittsburgh, PA, USA rpadman@cmu.edu ABSTRACT Studies suggest that one in three US adults use the Internet to diagnose or learn about a health concern. However, such access to health information online could exacerbate the disparities in health information availability and use. Health information seeking behavior (HISB) refers to the ways in which individuals seek information about their health, risks, illnesses, and health-protective behaviors. For patients engaging in searches for health information on digital media platforms, health literacy divides can be exacerbated both by their own lack of knowledge and by algorithmic recommendations, with results that disproportionately impact disadvantaged populations, minorities, and low health literacy users. This study reports on an exploratory investigation of the above challenges by examining whether responsible and representative recommendations can be generated using advanced analytic methods applied to a large corpus of videos and their metadata on a chronic condition (diabetes) from the YouTube social media platform. The paper focusses on biases associated with demographic characters of actors using videos on diabetes that were retrieved and curated for multiple criteria such as encoded medical content and their understandability to address patient education and population health literacy needs. This approach offers an immense opportunity for innovation in human-in-the-loop, augmented-intelligence, bias- aware and responsible algorithmic recommendations by combining the perspectives of health professionals and patients into a scalable and generalizable machine learning framework for patient empowerment and improved health outcomes. KEYWORDS Patient Education, Responsible Video Recommendations, Health Literacy, Machine Learning, Natural Language Processing 1. Introduction The World Health Organization (WHO) defines health literacy as "the cognitive and social skills which determine the motivation and ability of individuals to gain access to understand and use information in ways which promote and maintain good health" 23 . With estimates that only 12% of the US adult population is proficient in health literacy, healthcare knowledge and advice in video format may be more acceptable to much of the public 22 . Social media, particularly multi-media-rich visual social media, offers tremendous promise as a pathway for contextualized patient education and empowerment and public health literacy. However, how to produce fair and responsible recommendations in the form of curated health information videos that address the huge diversity of needs, abilities, and interests of consumers to improve their health knowledge, self-care skills and health outcomes has yet to be investigated. The challenge 23 with finding relevant and responsible advice on social media is two-fold: first, we need scalable and generalizable methods that can identify and retrieve health education videos with encoded medical content that is highly understandable. Second, we need a fair and bias-minimizing approach that ensures recommendations are not skewed against a particular demographic group or set of ideas. In this paper, we summarize both the challenges in this undertaking and the augmented intelligence approach we have developed that can subsequently be evaluated using social science methods. 1.1 Patient Education, Health Literacy, and Information Gaps The current health education process for patients and the public has substantial information gaps. For patients engaging in searches for health information on digital media platforms, the gap in health literacy is exacerbated by their own lack of digital literacy, challenges of misinformation and disinformation, and the disparities in health information availability and use. While traditional media (such as books and brochures) or healthcare professionals have long been the primary source of health information, the advent of the Internet and social media has witnessed a sea change in health information seeking behavior (HISB). There is a dearth of knowledge on digital literacy gaps and content biases and how these are reflected in HISB on social media and digital platforms. The evaluation of patients’ comprehension of educational materials in the healthcare delivery setting is a challenge amplified by low health literacy levels to correctly interpret health information 21 . For patients to benefit from such educational materials also requires an elevated level of participation and engagement 10 . The rise of YouTube as a platform for the dissemination of healthcare information potentially offers a novel pathway to enhance patient education and the utilization of existing resources 16 . Users typically encounter videos on healthcare conditions through keyword searches on YouTube. It is a daunting challenge for both patients and clinicians to search for responsibly curated videos, retrieve them for each care delivery context, and use them in the form of just-in-time, contextualized, prescriptive digital therapeutic interventions. Since users are heterogenous in their health information needs as well as in their levels of health literacy, relationship-oriented factors, such as trust and physician communication style, have been linked to disparities in patient satisfaction, delivery of preventive care services, appropriate use of referrals, and patient follow-through on treatments. Given this diverse set of challenges, it is necessary to develop a scalable, human-in-the-loop, augmented intelligence approach that synthesizes multiple machine learning methods with annotation by domain experts to extract relevant video content from digital platforms. Furthermore, a causal framework will permit us to understand the process of HISB and its impact on outcomes and evaluate the videos for accuracy and credibility to recommend for public use.