Semi-Automatic Knowledge Acquisition Through CODA Manuel Fiorelli, Riccardo Gambella, Maria Teresa Pazienza, Armando Stellato, Andrea Turbati ART Research Group, Dept. of Enterprise Engineering (DII), University of Rome, Tor Vergata Via del Politecnico, 1, 00133 Rome, Italy {fiorelli, pazienza, stellato, turbati}@info.uniroma2.it; gambella.riccardo@gmail.com Abstract. In this paper, we illustrate the benefits deriving from the adoption of CODA (Computer-aided Ontology Development Architecture) for the semi- automatic acquisition of knowledge from unstructured information. Based on UIMA for the orchestration of analytics, CODA promotes the reuse of independently developed information extractors, while providing dedicated capabilities for projecting their output as RDF triples conforming to a user provided vocabulary. CODA introduces a clear workflow for the coordination of concurrently working teams through the incremental definition of a limited number of shared interfaces. In the proposed semi-automatic knowledge acquisition process, humans can validate the automatically produced triples, or refine them to increase their relevance to a specific domain model. An experimental user interface tries to raise efficiency and effectiveness of human involvement. For instance, candidate refinements are provided based on metadata about the triples to be refined, and the already assessed knowledge in the target semantic repository. Keywords: Human-Computer Interaction, Ontology Engineering, Ontology Population, Text Analytics, UIMA 1. Introduction Efficient Information Management and Information Gathering are becoming extremely important to derive value from the large amount of available information. While the uptake of Linked Data [1] promoted uniform standards for the publication of information as interlinked datasets, the Web still consists mainly of unstructured content. Dealing with this heterogeneous content asks for coordinated capabilities of several dedicated tools. In fact, development of knowledge acquisition systems today largely requires non-trivial integration effort and the development of ad hoc solutions for tasks, which could be better defined and channeled into an organic approach. Platforms such as GATE [2] and UIMA [3] provide standard support for content analytics, while they completely demand to developers tasks concerning data transformation and publication. There have been a few attempts at completing information extraction architectures with facilities for the generation of RDF