An Investigative Search Engine for the Human Trafficking Domain Mayank Kejriwal, Pedro Szekely Information Sciences Institute {kejriwal,pszekely}@isi.edu Abstract. Enabling intelligent search systems that can navigate and facet on entities, classes and relationships, rather than plain text, to an- swer questions in complex domains is a longstanding aspect of the Seman- tic Web vision. This paper presents an investigative search engine that meets some of these challenges, at scale, for a variety of complex queries in the human trafficking domain. The engine provides a real-world case study of synergy between technology derived from research communities as diverse as Semantic Web (investigative ontologies, SPARQL-inspired querying, Linked Data), Natural Language Processing (knowledge graph construction, word embeddings) and Information Retrieval (fast, user- driven relevance querying). The search engine has been rigorously proto- typed as part of the DARPA MEMEX program and has been integrated into the latest version of the Domain-specific Insight Graph (DIG) archi- tecture, currently used by hundreds of US law enforcement agencies for investigating human trafficking. Over a hundred millions ads have been indexed. The engine is also being extended to other challenging illicit domains, such as securities and penny stock fraud, illegal firearm sales, and patent trolling, with promising results. Keywords: Knowledge graphs, Investigative search, Human trafficking, Illicit domains, Knowledge graph construction 1 Introduction Recent studies confirm a formidable reach of illicit players both online and offline. For example, data from the National Human Trafficking Resource Center shows that human trafficking (HT) is not only on the rise in the United States, but is a problem of international proportions [12], [21]. The advent of the Web has made the problem worse [10]. Human trafficking victims are advertised both on the Open and Dark Web, with estimates of the number of (not necessarily unique) published advertisements being in the hundreds of millions [22]. In recent years, various agencies in the US have turned to technology to assist them in combating this problem through the suggestion of leads, evidence and HT indicators. An important goal is to answer entity-centric questions over noisy Web corpora crawled from a subset of Web domains known for HT-related activity. Entities are typically HT victims, such as escorts, but could also be latent entities such as vendors, who organize the activity.