Improved Algorithms for Theory Revision with Queries (Extended Abstract) Judy Goldsmith Dept. of Computer Science University of Kentucky 763 Anderson Hall Lexington, KY 40506 goldsmit@cs.uky.edu Robert H. Sloan Dept. of EE & Comp. Sci. U. Illinois at Chicago 851 S. Morgan St. Rm 1120 Chicago, IL 60607-7053 sloan@eecs.uic.edu Bal´ azs Sz¨ or´ enyi Dept. of Computer Science University of Szeged Hungary sirnew@edge.stud.u-szeged.hu Gy¨ orgy Tur´ an Math, Stat., & CS Dept. U. Illinois at Chicago, Research Group on AI Hungarian Acad. of Sciences gyt@uic.edu Abstract We give a revision algorithm for monotone DNF formulas in the general revision model (additions and deletions of variables) that uses queries, where is the number of terms, the revision distance to the target formula, and the number of variables. We also give an algorithm for revising 2-term unate DNF formulas in the same model, with a similar query bound. Lastly, we show that the earlier query bound on revising read- once formulas in the deletions-only model can be improved from to . 1 INTRODUCTION A doctor has a theory about the patient and makes recom- mendations. They don’t work. The doctor must change the theory. Rather than start from scratch again, she runs diag- nostics designed to lead to incremental changes in the theory. If she was nearly correct, this should be more efficient than beginning all over again. The goal of concept learning, and indeed of all learning from examples, is to obtain a representation of a concept or function on some domain so that one can use it to predict the function’s value on new instances from the domain. How- ever, in using this function on some performance task, one may well learn that it is not exactly correct (e.g., in med- ical diagnosis if the patient does not recover). Hence one wants to revise this function. Intuitively, if one already has a roughly correct function, then altering it to be exactly cor- rect should require much less training data than learning the function from scratch. This paper and previous work [6, 14] show that this is indeed the case. Note that what the computational learning theory com- munity calls a concept is often referred to as a theory in logic, and either a theory or a knowledge base elsewhere in artifi- cial intelligence. We will henceforth refer to the problem Partially supported by NSF grant CCR-9610348; work done while visiting the Dept. of EECS at the University of Illinois at Chicago and the Dept. of Computer Science at Boston University. Partially supported by NSF grant CCR-9800070. Partially supported by NSF grant CCR-9800070, and OTKA T-25721. of revising a concept by its most common name in machine learning: theory revision. We frame this problem in the model of learning with membership and equivalence queries. We believe that the query model with both equivalence and membership queries is especially well suited to the theory revision problem for two reasons. First, in practice theory revision would be used for deployed AI systems that make mistakes, and typically a human expert would be the one to say that a system had made a mistake. So there is a human expert who is provid- ing something like counterexamples to equivalence queries, and this human expert should be able to answer membership queries as well. Second, as we will discuss in more detail, there is evidence that it will be very difficult or impossible to make progress on theory revision using only equivalence queries (or only PAC-type sampling). In this paper, we present three new results. We show how to revise -term monotone DNF, and 2-term unate DNF, in time poly poly , where is the minimum number of revisions needed, and is the total number of variables, allowing essentially arbitrary revisions to the ini- tial theory; as long as , this is faster than relearn- ing the theory from scratch. Each of these results improves over a previous result for 2-term monotone DNF [6]. Addi- tionally, we reduce the query complexity for revising read- once formulas from to . This is very close to optimal; the lower bound on the number of queries is [15]. We next explain a bit more about the model of theory re- vision used here, put our results into context, and then com- pare the results in this paper with previous results. 1.1 MODEL OF THEORY REVISION The key metric for theory revision is the syntactic distance between the initial theory and the target theory. The syn- tactic distance between a given concept representation and another concept is the minimal number of elementary op- erations (such as the addition or the deletion of a literal or a clause) needed to transform the given concept representa- tion to a representation of the other concept. Our goal in theory revision is to find algorithms whose query complex- ity is polynomial in the syntactic difference (or revision dis- tance) between the initial theory and the target theory, but only polylogarithmic in the total number of possible vari- ables. Thus, this work has some similarities to the work on