Algorithm for population of Object Property Assertions derived from Telecom Contact Centre Product Support Documentation Alexandre Kouznetsov, Jonas B. Laurila and Christopher J.O. Baker CSAS, University of New Brunswick Saint John, Canada e-mail: alexk@unb.ca Bradley Shoebottom Innovatia Inc. Saint John, Canada e-mail: bradley.shoebottom@innovatia.net Abstract— Relaying of information from technical documentation by contact center workers to assist clients is limited by industry standard storage formats and query mechanisms. Here we present and evaluate a new methodology for processing technical documents and tagging them against a Telecommunications Hardware domain ontology. We deploy classical ontological NLP approaches to extract information from both text segments and tables, identifying text segments, named entities and relations between named entities described by an existing T-Box. We describe a method for scoring candidate object property assertions derived from text before populating the Telecom Hardware ontology. In our algorithm, we leverage customized gazetteer lists, including lists specific to object property synonyms, and use functions of distance between co-occurring terms to score candidate A-box object property assertions. We review the performance of this approach with a use case involving Tier 1 and Tier 2 call centre agents using a visual query tool, Top Braid Live, to interrogate the instantiated Telecom Hardware ontology for information relevant to the needs of clients. Keywords-object property assertions; scoring algorithm; ontological natural language processing; OWL ontology; OWL ontology population; technical support; telecom contact centre. I. INTRODUCTION Product technical support is a significant cost to manufacturers. Telecommunications technical support teams spend 25 to 50% of their time searching for case-specific answers. Industry needs to reduce this non-productive search time. Search tasks in the contact centre occur in poorly integrated repositories containing case notes on customer relationship management (CRM) and technical documentation. There are no links between previous cases, symptoms, possible causes, and suggested solutions with procedures from technical publications. The underlying strategy for data integration of technical documentation with CRM databases includes text mining for pertinent information and its integration with structured knowledge. Our technical solution comprises Ontological Natural Language Processing (ONLP) involving named entity recognition, relation detection, ontology instantiation and knowledge based interrogation with SPARQL and visual querying. To support this paradigm, we also examine the problem of populating the correct relations between individuals. This can be solved by our novel algorithm that scores candidate A-Box object property assertions depending on textual occurrences of relations and how close they are to the textual descriptions of their respective domain and range in the T- box. II. RELATED WORK Similar approaches have already been explored elsewhere, such as scientific knowledge discovery in the lipidomics domain, where new insights can be found by querying a lipid ontology instantiated through text mining [1]. Another example is [2] in the protein engineering domain, who also proposes the general term ontological text mining for approaches using an ontology to enforce text mining, which in turn will extend the ontology with instances. Related work in the area of relation extraction include: (i) the relation learning part of Ontology Learning that takes a set of concepts and a corpus as input and identifies T-Box level relations [3], (ii) manually created rules that, for given T-Box relations, are capable of extracting A-Box relations [4] and (iii) automatic creation of rules using machine learning approaches [5], which also can be weighted according to their relative accuracy [6]. Our scenario differs from (i) in that we already have a well defined T-Box, from (ii) in that we have many different relations to consider, making manual creation of rules cumbersome, and from (iii) in that we do not have enough training data for each and every different relation. Instead we make use of a gazetteer-based approach where synonyms of predicates are used in conjunction with simple co-occurrence of domain and range terms. Using gazetteers lists for predicate terms also makes it possible to differentiate relations that use the same domain and range. We propose a semi-automatic approach for knowledge discovery which is based on manual creation and curation of a T-Box ontology together along with synonym lists of