Fixing Idioms A recursion primitive for applicative DSLs Dominique Devriese Ilya Sergey Dave Clarke Frank Piessens iMinds-DistriNet, KU Leuven {firstname.lastname}@cs.kuleuven.be Abstract In a lazy functional language, the standard encoding of recursion in DSLs uses the host language’s recursion, so that DSL algorithms automatically use the host language’s least fixpoints, even though many domains require algorithms to produce different fixpoints. In particular, this is the case for DSLs implemented as Applicative functors (structures with a notion of pure computations and func- tion application). We propose a recursion primitive afix that models a recursive binder in a finally tagless HOAS encoding, but with a novel rank-2 type that allows us to specify and exploit the effects- values separation that characterises Applicative DSLs. Unlike re- lated approaches for Monad s and Arrow s, we model effectful re- cursion, not value recursion. Using generic programming techniques, we define an arity- generic version of the operator to model mutually recursive def- initions. We recover intuitive user syntax with a form of shallow syntactic sugar: an alet construct that syntactically resembles the let construct, which we have implemented in the GHC Haskell compiler. We describe a proposed axiom for the afix operator. We demonstrate usefulness with examples from Applicative parser combinators and functional reactive programming. We show how higher-order recursive operators like many can be encoded without special library support, unlike previous approaches, and we demon- strate an implementation of the left recursion removal transform. Categories and Subject Descriptors D.3.3 [Language Constructs and Features]: Recursion Keywords Applicative functors, observable recursion, HOAS 1. Introduction Let us start with an embedded domain-specific language (EDSL) of parser rules, modelled as the GADT (see e.g. [29]) Rule . The data type is parameterised by the type a of parse results: data Rule a where Pure :: a Rule a Seq :: Rule (a b ) Rule a Rule b Disj :: Rule a Rule a Rule a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PEPM’13, January 21–22, 2013, Rome, Italy. Copyright c 2013 ACM 978-1-4503-1842-6/13/01. . . $15.00 Fail :: Rule a Token :: Char Rule Char Rule provides DSL primitives Pure (match the empty string, re- turn a fixed result), Seq (sequence two rules, apply the result of the first to that of the second), Disj (choose between two rules), Fail and Token (parse and return a specified character). Rule uses the Applicative parser combinator style introduced and popularised by Swierstra et al. [31]. Readers may recognise Seq and Pure as pure and operators of an Applicative functor. With Haskell’s a f b for fab , the following bs and bs :: Rule String model a language of arbitrary-length sequences of bs: bs =(Pure (:) ‘Seq Token ’b’ Seq bs ) Disj Pure "" bs =(Pure snoc Seq bs Seq Token ’b’)‘Disj Pure "" According to rule bs , matches either start with token ’b’, followed by another match of bs or they are empty. The rule Pure (:) has no parse behaviour of bs , but produces the list cons operator (:) as a result, so that the parsed token ’b’ is consed with the result of the recursive match. For an empty match, the empty string "" is returned. Parser rule bs defines the same language and parse results but expects the recursive match first, and the token ’b’ second. bs is left-recursive: it refers to itself in a left-most position. The algorithm nullable :: Rule a Bool checks if a rule accepts the empty string: nullable (Pure ) = True nullable (Seq a b ) = nullable a nullable b nullable (Disj a b )= nullable a nullable b nullable Fail = False nullable (Token )= False This definition is satisfactory for finite rules, but a problem arises for infinite, recursive production rules like bs and bs above. A known problem of parser DSLs like Rule is that left-recursive rules are not treated well. Unlike nullable bs (which is True ), nullable bs is : computation loops forever. Computationally, nullable bs loops when considering the left-most part of its first alternative. Denotationally, nullable bs corresponds to the fix- point of a certain function, and the least fixpoint we get from Haskell is not the one we would like. This example shows that for DSLs like our grammar model, algo- rithms need more control of how fixpoints are calculated. As such, it is inappropriate to rely on Haskell’s least fixpoints. Otherwise, parsing libraries are restricted to top-down parsing algorithms, left- recursion is difficult (although some algorithms deal with it any- way, e.g. [16]) and some algorithms are impossible (e.g. print a representation of a rule’s parsing structure). Also for DSLs with re- 97