Leveraging Wikipedia for Ontology Pattern Population from Text Michelle Cheatham and James Lambert Wright State University 3640 Colonel Glenn Hwy. Dayton, OH 45435 Charles Vardeman II University of Notre Dame 111C Information Technology Center Notre Dame, IN 46556 Abstract Traditional approaches to populating ontology design pat- terns from unstructured text often involve using a dictionary, rules, or machine learning approaches that have been estab- lished based on a training set of annotated documents. While these approaches are quite effective in many cases, perfor- mance can suffer over time as the nature of the text documents changes to reﬂect advances in the domain of interest. This is particularly true when attempting to populate patterns related to fast-changing domains such as technology, medicine, or law. This paper explores the use of Wikipedia as a source of continually updated background knowledge to facilitate on- tology pattern population as the domain changes over time. Introduction Two of the central underpinnings of scientiﬁc inquiry are the need to verify results through reproduction of the ex- periments involved and the importance of “building on the shoulders of giants.” In order for these things to be possi- ble, experimental results need to be both discoverable and reproducible. Important steps have been made recently in pursuit of this, including the relaxation of page restrictions on “methodology” sections in many academic journals and the requirements by some funding agencies that investiga- tors make any data they collect publicly available. However, in order for previous work to be truly veriﬁable and reusable, researchers must be able to not only access the results of those efforts but also to understand the context in which they were created. A key element of this is the need to preserve the underlying computations and analytical process that led to prior results in a generic machine-readable format. In previous work towards this goal, we developed an ontology design pattern (ODP) to represent the computa- tional environment in which an analysis was performed. This model is brieﬂy described in Section of this work, and more detail is available in (Cheatham et al. 2017). This paper de- scribes our work on the next step: development of an au- tomated approach to populate the ODP based on data ex- tracted from academic articles. We explore the performance Copyright held by the author(s). In A. Martin, K. Hinkelmann, A. Gerber, D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learn- ing with Knowledge Engineering (AAAI-MAKE 2019). Stanford University, Palo Alto, California, USA, March 25-27, 2019. of two common approaches to this task and show that, due to the fast-changing nature of computer technology, they lose their effectiveness over time. We then evaluate the utility of using a continuously manually-curated knowledge base to mitigate this performance degradation. The results illustrate that this method holds some promise. The remainder of this paper is organized as follows. The section Computational Environment Representation presents the schema of the ontology we seek to populate, while the Dataset section describes the collection of aca- demic articles we use as our training and test sets. The ap- proach and results are presented and analyzed next, and ﬁ- nally some conclusions and ideas for future work in this area are discussed. Computational Environment Representation The Computational Environment ODP was developed over the course of several working sessions by a group of on- tological modeling experts, library scientists, and domain scientists from different ﬁelds, including computational chemists and high-energy physicists interested in preserving analysis of data collected from the Large Hadron Collider at CERN. Our goal was to arrive at an ontology design pat- tern that is capable of answering the following competency questions: • What environment do I need to put in place in order to replicate the work in Paper X? • There has been an error found in Script Y. Which analyses need to be re-run? • Based on recent research in Field Z, what tools and re- sources should new students work to become familiar with? • Are the results from Study A and Study B comparable from a computational environment perspective? We focused on creating a model to capture the actual en- vironment present during a computational analysis. Repre- senting all possible environments in which it is feasible for the analysis to be executed is outside of the scope of our current effort. We also do not include the runtime conﬁg- uration and parameters as part of the environment. The ra- tionale is that to some extent the same environment should be applicable to many computational analyses in the same