Discovering All Most Specific Sentences DIMITRIOS GUNOPULOS Computer Science and Engineering Department, University of California, Riverside RONI KHARDON EECS Department, Tufts University, Medford, MA HEIKKI MANNILA Department of Computer Science, University of Helsinki, Helsinki, Finland SANJEEV SALUJA LSI Logic, Milpitas, CA HANNU TOIVONEN Department of Computer Science, University of Helsinki, Helsinki, Finland and RAM SEWAK SHARMA Computer Science and Engineering Department, University of California, Riverside Data mining can be viewed, in many instances, as the task of computing a representation of a theory of a model or a database, in particular by finding a set of maximally specific sentences satisfying some property. We prove some hardness results that rule out simple approaches to solving the problem. The a priori algorithm is an algorithm that has been successfully applied to many instances of the problem. We analyze this algorithm, and prove that is optimal when the maximally specific sentences are “small”. We also point out its limitations. The work of D. Gunopulos was partially supported by National Science Foundation (NSF) CAREER Award 9984729, NSF grants IIS-9907477 and ITR 0220148, and the Department of Defense (DoD). The reseach of R. Khardon was supported by Office of Naval Research (ONR) grant N00014-95-1- 0550 and ARO grant DAAL03-92-G-0115. Authors’ addresses: D. Gunopulos and R. Sewak Sharma, Computer Science and Engineering Department, University of California, Riverside, Riverside, CA 92507; email: {dg;rssharma}@cs. ucr.edu; R. Khardon, EECS Department, Tufts University, Medford, MA 02155, email: roni@ eecs.tufts.edu; H. Mannila, HIIT Basic Research Unit, Department of Computer Science, Uni- versity of Helsinki, Helsinki; Finland; email: Heikki.Mannila@cs.helsinki.fi; S. Saluja, LSI Logic, MS E 192, 1551 McCarthy Blvd., Milpitas, CA 95035; email: sanjeev@lsil.com; H. Toivonen, De- partment of Computer Science, University of Helsinki, Helsinki, Finland; email: Hannu.Toivonen@ ca.helsinki.fi. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or permissions@acm.org. C 2003 ACM 0362-5915/03/0600-0140 $5.00 ACM Transactions on Database Systems, Vol. 28, No. 2, June 2003, Pages 140–174.