KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis Filip Ilievski 1[0000-0002-1735-0686] , Daniel Garijo 1[0000-0003-0454-7145] , Hans Chalupsky 1[0000-0002-8902-1662] , Naren Teja Divvala 1 , Yixiang Yao 1[0000-0002-2471-5591] , Craig Rogers 1[0000-0002-5818-3802] , Rongpeng Li 1[0000-0002-6911-8002] , Jun Liu 1 , Amandeep Singh 1[0000-0002-1926-6859] , Daniel Schwabe 2[0000-0003-4347-2940] , and Pedro Szekely 1 1 Information Sciences Institute, University of Southern California {ilievski,dgarijo,hans,divvala,yixiangy,rogers, rli,junliu,amandeep,pszekely}@isi.edu 2 Dept. of Informatics, Pontificia Universidade Cat´ olica Rio de Janeiro dschwabe@inf.puc-rio.br Abstract. Knowledge graphs (KGs) have become the preferred tech- nology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is hetero- geneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper we present KGTK, a data science-centric toolkit designed to represent, cre- ate, transform, enhance and analyze KGs. KGTK represents graphs in tables and leverages popular libraries developed for data science applica- tions, enabling a wide audience of developers to easily construct knowl- edge graph pipelines for their applications. We illustrate the framework with real-world scenarios where we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet. Resource type: Software License: MIT DOI: https://doi.org/10.5281/zenodo.3828068 Repository: https://github.com/usc-isi-i2/kgtk/ Keywords: knowledge graph · knowledge graph embedding · knowl- edge graph filtering · knowledge graph manipulation 1 Introduction Knowledge graphs (KGs) have become the preferred technology for representing, sharing and using knowledge in applications. A typical use case is building a new knowledge graph for a domain or application by extracting subsets of several ex- isting knowledge graphs, combining these subsets in application-specific ways, augmenting them with information from structured or unstructured sources, and computing analytics or inferred representations to support downstream applica- tions. For example, during the COVID-19 pandemic, several efforts focused on arXiv:2006.00088v3 [cs.AI] 26 May 2021