7
AnchorViz: Facilitating Semantic Data Exploration and
Concept Discovery for Interactive Machine Learning
JINA SUH, SOROUSH GHORASHI, and GONZALO RAMOS, Microsoft Research,
Redmond, Washington
NAN-CHEN CHEN, University of Washington, Seattle, Washington
STEVEN DRUCKER, JOHAN VERWEY, and PATRICE SIMARD, Microsoft Research,
Redmond, Washington
When building a classifer in interactive machine learning (iML), human knowledge about the target class
can be a powerful reference to make the classifer robust to unseen items. The main challenge lies in fnd-
ing unlabeled items that can either help discover or refne concepts for which the current classifer has no
corresponding features (i.e., it has feature blindness). Yet it is unrealistic to ask humans to come up with an
exhaustive list of items, especially for rare concepts that are hard to recall. This article presents AnchorViz,
an interactive visualization that facilitates the discovery of prediction errors and previously unseen concepts
through human-driven semantic data exploration. By creating example-based or dictionary-based anchors
representing concepts, users create a topology that (a) spreads data based on their similarity to the concepts
and (b) surfaces the prediction and label inconsistencies between data points that are semantically related.
Once such inconsistencies and errors are discovered, users can encode the new information as labels or fea-
tures and interact with the retrained classifer to validate their actions in an iterative loop. We evaluated
AnchorViz through two user studies. Our results show that AnchorViz helps users discover more prediction
errors than stratifed random and uncertainty sampling methods. Furthermore, during the beginning stages
of a training task, an iML tool with AnchorViz can help users build classifers comparable to the ones built
with the same tool with uncertainty sampling and keyword search, but with fewer labels and more generaliz-
able features. We discuss exploration strategies observed during the two studies and how AnchorViz supports
discovering, labeling, and refning of concepts through a sensemaking loop.
CCS Concepts: • Human-centered computing → Interactive systems and tools;
Additional Key Words and Phrases: Interactive machine learning, visualization, error discovery, semantic data
exploration, unlabeled data, concept discovery, machine teaching
ACM Reference format:
Jina Suh, Soroush Ghorashi, Gonzalo Ramos, Nan-Chen Chen, Steven Drucker, Johan Verwey, and Patrice
Simard. 2019. AnchorViz: Facilitating Semantic Data Exploration and Concept Discovery for Interactive Ma-
chine Learning. ACM Trans. Interact. Intell. Syst. 10, 1, Article 7 (August 2019), 38 pages.
https://doi.org/10.1145/3241379
The reviewing of this article was managed by special issue associate editors Mark Billinghurst, Margaret Burnett, and
Aaron Quigley.
Authors’ addresses: J. Suh, S. Ghorashi, G. Ramos, S. Drucker, J. Verwey, and P. Simard, Microsoft Research, 1 Microsoft Way,
Redmond, WA, 98052; emails: {jinsuh, sorgh, goramos, sdrucker, joverwey, patrice}@microsoft.com; N.-C. Chen, University
Washington, Seattle, WA, 98195; email: nanchen@uw.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and
the full citation on the frst page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org.
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2160-6455/2019/08-ART7 $15.00
https://doi.org/10.1145/3241379
ACM Transactions on Interactive Intelligent Systems, Vol. 10, No. 1, Article 7. Publication date: August 2019.