Big Data Research 16 (2019) 49–58 Contents lists available at ScienceDirect Big Data Research www.elsevier.com/locate/bdr Interactive Visual Analytics for Sensemaking with Big Text Michelle Dowling , Nathan Wycoff, Brian Mayer, John Wenskovitch, Scotland Leman, Leanna House, Nicholas Polys, Chris North, Peter Hauck a r t i c l e i n f o a b s t r a c t Article history: Received 16 September 2018 Received in revised form 22 March 2019 Accepted 17 April 2019 Available online 25 April 2019 Keywords: Text analytics Big data Visualization Interactive visual analytics Semantic interaction Topic modeling Analysts face many steep challenges when performing sensemaking tasks on collections of textual information larger than can be reasonably analyzed without computational assistance. To scale up such sensemaking tasks, new methods are needed to interactively integrate human cognitive sensemaking activity with machine learning. Towards that goal, we offer a human-in-the-loop computational model that mirrors the human sensemaking process, and consists of foraging and synthesis sub-processes. We model the synthesis loop as an interactive spatial projection and the foraging loop as an interactive relevance ranking combined with topic modeling. We combine these two components of the sensemaking process using semantic interaction such that the human’s spatial synthesis actions are transformed into automated foraging and synthesis of new relevant information. Ultimately, the model’s ability to forage as a result of the analyst’s synthesis activities makes interacting with big text data easier and more efficient, thereby facilitating analysts’ sensemaking ability. We discuss the interaction design and theory behind our interactive sensemaking model. The model is embodied in a novel visual analytics prototype called Cosmos in which analysts synthesize structure within the larger corpus by directly interacting with a reduced-dimensionality space to express relationships on a subset of data. We then demonstrate how Cosmos supports sensemaking tasks with a realistic scenario that investigates the affect of natural disasters in Adelaide, Australia in September 2016 using a database of over 30,000 news articles. 2019 Elsevier Inc. All rights reserved. 1. Introduction The overarching goal of this work is to computationally aug- ment human sensemaking capabilities in the context of big text analysis problems. For example, intelligence analysts must forage large collections of text for relevant information and synthesize a coherent story from fragments. Such sensemaking activities are modeled by Pirolli and Card’s “sensemaking loop” [1], which is composed of two primary, interconnected sub-loops: the foraging loop and the synthesis loop. Traditionally, much of this sensemak- ing activity, especially synthesis, requires human cognitive intel- ligence. However, to efficiently scale up sensemaking to big data, more semi-automated augmentation is needed. To support the hu- man cognitive activity, it is important that the automation fits naturally into the human sensemaking workflow. The sensemaking loop is a cognitive model. Thus, to sup- port automation, one challenge is to concretize the sensemaking loop into a computationally-oriented model with formalized sub- components. In this work, we formally model the synthesis loop as an interactive data structuring process, and the foraging loop as * Corresponding author. E-mail address: dowlingm@vt.edu (M. Dowling). an interactive relevance model driven by the result of the struc- turing model. A related challenge is the high-dimensional nature of text data, which makes it difficult to support real-time, interac- tive structuring methods. Our approach is to exploit topic modeling methods to reduce dimensionality between the foraging and the synthesis models. Yet a further challenge in enabling this automation lies in the human-centered, interactive, and iterative nature of sensemaking. For example, in the “dual search” process [1] that connects syn- thesis and foraging, analysts simultaneously identify hypotheses that synthesize the supporting evidence while also foraging for additional evidence for the hypotheses. Through iteration, analysts incrementally formalize [2] their hypotheses and arguments. To sup- port this user-driven nature of the models, we exploit the princi- ples of semantic interaction [3] to steer semi-supervised machine learning algorithms, updating the models based on learned user interest. Semantic interaction methods seek to learn users’ cogni- tive sensemaking intents by observing their interactions, such as their interactive structuring activities in the synthesis loop. This enables analysts to stay focused on their familiar sensemaking pro- cess rather than thinking about manipulating underlying statistical models. For our computational sensemaking model, this requires designing machine learning “inverses” [4,5] for the synthesis and foraging models that learn from user’s structuring and searching https://doi.org/10.1016/j.bdr.2019.04.003 2214-5796/2019 Elsevier Inc. All rights reserved.