7 AnchorViz: Facilitating Semantic Data Exploration and Concept Discovery for Interactive Machine Learning JINA SUH, SOROUSH GHORASHI, and GONZALO RAMOS, Microsoft Research, Redmond, Washington NAN-CHEN CHEN, University of Washington, Seattle, Washington STEVEN DRUCKER, JOHAN VERWEY, and PATRICE SIMARD, Microsoft Research, Redmond, Washington When building a classifer in interactive machine learning (iML), human knowledge about the target class can be a powerful reference to make the classifer robust to unseen items. The main challenge lies in fnd- ing unlabeled items that can either help discover or refne concepts for which the current classifer has no corresponding features (i.e., it has feature blindness). Yet it is unrealistic to ask humans to come up with an exhaustive list of items, especially for rare concepts that are hard to recall. This article presents AnchorViz, an interactive visualization that facilitates the discovery of prediction errors and previously unseen concepts through human-driven semantic data exploration. By creating example-based or dictionary-based anchors representing concepts, users create a topology that (a) spreads data based on their similarity to the concepts and (b) surfaces the prediction and label inconsistencies between data points that are semantically related. Once such inconsistencies and errors are discovered, users can encode the new information as labels or fea- tures and interact with the retrained classifer to validate their actions in an iterative loop. We evaluated AnchorViz through two user studies. Our results show that AnchorViz helps users discover more prediction errors than stratifed random and uncertainty sampling methods. Furthermore, during the beginning stages of a training task, an iML tool with AnchorViz can help users build classifers comparable to the ones built with the same tool with uncertainty sampling and keyword search, but with fewer labels and more generaliz- able features. We discuss exploration strategies observed during the two studies and how AnchorViz supports discovering, labeling, and refning of concepts through a sensemaking loop. CCS Concepts: • Human-centered computing → Interactive systems and tools; Additional Key Words and Phrases: Interactive machine learning, visualization, error discovery, semantic data exploration, unlabeled data, concept discovery, machine teaching ACM Reference format: Jina Suh, Soroush Ghorashi, Gonzalo Ramos, Nan-Chen Chen, Steven Drucker, Johan Verwey, and Patrice Simard. 2019. AnchorViz: Facilitating Semantic Data Exploration and Concept Discovery for Interactive Ma- chine Learning. ACM Trans. Interact. Intell. Syst. 10, 1, Article 7 (August 2019), 38 pages. https://doi.org/10.1145/3241379 The reviewing of this article was managed by special issue associate editors Mark Billinghurst, Margaret Burnett, and Aaron Quigley. Authors’ addresses: J. Suh, S. Ghorashi, G. Ramos, S. Drucker, J. Verwey, and P. Simard, Microsoft Research, 1 Microsoft Way, Redmond, WA, 98052; emails: {jinsuh, sorgh, goramos, sdrucker, joverwey, patrice}@microsoft.com; N.-C. Chen, University Washington, Seattle, WA, 98195; email: nanchen@uw.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2160-6455/2019/08-ART7 $15.00 https://doi.org/10.1145/3241379 ACM Transactions on Interactive Intelligent Systems, Vol. 10, No. 1, Article 7. Publication date: August 2019.