Reconciling Function and Structure in Scientific Models Scott Friedman, Mark Burstein, David McDonald SIFT, Minneapolis, MN, USA {friedman, burstein, dmcdonald}@sift.net James Pustejovsky, Peter Anick Brandeis University, Waltham, MA, USA. {jamesp, panick}@cs.brandeis.edu Rusty Bobrow Bobrow Computational Intelligence, LLC rjbobrow@gmail.com Brent Cochran Tufts University School of Medicine, Boston, MA, USA brent.cochran@tufts.edu Abstract Despite our increasing understanding of the structure and dynamics of scientific domains, functional knowledge and functional language— such as referring to a central pur- pose or function of a molecule— permeate scientific ar- ticles. Cognitive systems that collaborate with scientists must therefore represent functional knowledge to support ma- chine reading and explanation. This paper describes our progress on automatically inferring and representing func- tional knowledge in R3 (Reading, Reasoning, and Report- ing). R3 automatically reads biology articles from PubMed Central, using a massive domain model from Pathway Com- mons (www.pathwaycommons.org/) as background knowl- edge. R3 now relates functional language to its background structural model and explains functional knowledge, which is the central contribution of this paper. We motivate the rep- resentation of functional knowledge in the biology domain— which many existing ontologies omit— using examples from PubMed articles. We then describe how R3 automatically adds functional knowledge to its model by parsing textual summaries of biological processes and extracting semantics. We then describe how R3 builds event structures and compo- sitional models with functional knowledge, and we illustrate how R3 uses its functional knowledge to diagram protein ac- tivity from the information it learned from reading. Introduction The concepts and factors we use to model scientific domains for our intelligent systems are often incommensurable with the concepts and factors we use to communicate scientific findings to our human peers. This is for a good reason, since intelligent systems and humans often serve complementary roles in the scientific process: machines engage in parallel search and discovery over vast structural models and net- works of entities, while people frequently learn and commu- nicate salient forms of entities with functional or intentional language. For instance, biologists often describe proteins and other natural kinds with functional contextual descrip- tors such as “active” and “inactive,” and they compactly re- fer to the “activity” of an entity as its central function within a complex system. Biologists often use artifactual mental models— such as molecular switches— to describe and reason about proteins. The molecular switch metaphor explicitly describes natural kinds (i.e., proteins) as artifacts (i.e., on/off switches), rather than just describing the behavior or capability of the natu- ral kinds. For instance, this sentence from Akinleye et al. (2013) describes the proteins of the Ras family as molecular switches that are inactive (i.e., functionally off ) when bound to GDP and active (i.e., on) when bound to GTP: “H-Ras, K-Ras, and N-Ras function as molecular switches when an inactive Ras-GDP is converted into an active Ras-GTP.” This relates a structural change (i.e., GDP/GTP binding) to a contextual function: when bound to GTP, Ras is able to perform its agreed-upon function (as opposed to many other reactions that Ras engages in) within a specific cell signaling pathway. Intelligent systems that learn by reading must bridge this structural-functional gap: given only the structural knowl- edge from an ontology describing a complex system, a sys- tem can not resolve references to “active” or “inactive” en- tities that collectively “contribute to” some macro behav- ior, nor can it resolve references to the “activity” or func- tional capabilities of an entity. Our Reading, Reasoning, and Reporting (R3) system, developed as part of DARPA’s Big Mechanism program (Cohen, 2015), reads articles in molec- ular biology to extend and revise its structural and functional models of biological mechanisms (Friedman et al., submit- ted, McDonald et al., 2016). Our recent extensions to R3 aim to automatically bridge the structural-functional gap. This involves extending tra- ditional compositional modeling semantics (Falkenhainer and Forbus, 1991) to support event structure (Pustejovsky, 1991b) and telic qualia (i.e., functional descriptions) from Generative Lexicon (GL) theory (Pustejovsky, 1991a). Extending the modeling semantics gives R3 the represen- tational capabilities, but it also needs the content to construct and populate these models. Fortunately, the model we are extending is annotated with English summaries of the in- dividual reactions involved, written by human experts. R3 is thus able to extend the domain model to include critical functional information by automatically reading these tex- tual summaries embedded within the model. R3 extracts the functional semantics from the summaries and automatically extends its model by adding functional characterizations of the reactions. It automatically identifies (1) events that com- prise the entity’s function (2) structural preconditions for