Comput. Lang. Vol. 13, No. 3/4, pp. 149-170, 1988 0096-0551/88 $3.00 + 0.00 Printed in Great Britain. All rights reserved Copyright i~S 1988 Pergamon Pre~s pie STRING PATTERN-MATCHING IN PROLOG MARCO A. CASANOVA and ANTONIO L. FURTADO* Rio Scientific Center--IBM Brasil, Estrada da Canoa, 3520, 22.610, Rio de Janeiro, RJ, Brasil (Received 18 December 1987: in revised.]orm 30 June 1988) Abstract--A pattern-matching feature for the Prolog language is described. Through the use of patterns, introduced as Prolog predicates, the feature favors the specification of string handling algorithms in a declarative style. A number of convenient pre-defined patterns, adapted from SNOBOL 4, are included. The use of two-level grammars as a paradigm for developing Prolog programs incorporating the pattern-matching feature is also discussed. Logic programming Prolog Pattern-matching String processing SNOBOL 1. INTRODUCTION Prolog strings provide a convenient way to represent arbitrary sentences of natural or artificial languages. Unfortunately, most Prolog dialects have a very limited and often low-level set of built-in operations on character strings, such as the substring and concatenation operations. Hence, if an application involves sophisticated string manipulation, then one is almost forced to adopt the strategy of representing strings as lists of characters, since unification cannot "look inside" strings [1]~ But this strategy implies that the Prolog programmer must invest some effort in mastering the various techniques for mapping strings into lists, thus diverting his attention from the application in hand. From another perspective, this strategy requires representing a data type T1 (strings) by another data ~ype T2 (lists) and expressing the operations of Yl in terms of those available for T2, which conflicts with the current trend towards abstract data types. This paper then describes a high-level pattern-matching feature that facilitates the specification of string handling algorithms in a declarative style by hiding all details concerning the represent- ation of strings. The paper also includes a number of convenient pre-defined patterns, adapted from SNOBOL 4, and discusses the use of two-level grammars as a paradigm for developing Prolog programs incorporating the pattern-matching feature. More precisely, the basic idea behind the paper goes as follows. Consider the fundamental problem of determining whether a string S satisfies some property P. The obvious solution in Prolog is to define a predicate p in such a way that S has property P if and only if p(S) is true. Property P may in turn be defined in terms of a set of properties P~, P2 ..... Pn in the sense that a string S satisfies P iff there are substrings $1, $2 ..... Sn of S that satisfy, respectively, prope~rties P1, Pz ..... Pn' Correspondingly, in Prolog, predicate p would have a conditional definition of the form: p(S) <- split(s, rS1,S2 ..... Snl) & Pl ($1) & p2 (S2) & Pn (Sn). where the predicate split has the task of splitting S into substrings. Instead of a predicate like split, we introduce however the match meta-predicate, leading to a more concise definition of p: p(S) <- match(S, p, IIP2LI ... IlPo). *On leave from the Pontificia Universidade Catolica do Rio de Janeiro. 149