Fixing Idioms A recursion primitive for applicative DSLs Dominique Devriese Ilya Sergey Dave Clarke Frank Piessens iMinds-DistriNet, KU Leuven {ﬁrstname.lastname}@cs.kuleuven.be Abstract In a lazy functional language, the standard encoding of recursion in DSLs uses the host language’s recursion, so that DSL algorithms automatically use the host language’s least ﬁxpoints, even though many domains require algorithms to produce different ﬁxpoints. In particular, this is the case for DSLs implemented as Applicative functors (structures with a notion of pure computations and func- tion application). We propose a recursion primitive aﬁx that models a recursive binder in a ﬁnally tagless HOAS encoding, but with a novel rank-2 type that allows us to specify and exploit the effects- values separation that characterises Applicative DSLs. Unlike re- lated approaches for Monad s and Arrow s, we model effectful re- cursion, not value recursion. Using generic programming techniques, we deﬁne an arity- generic version of the operator to model mutually recursive def- initions. We recover intuitive user syntax with a form of shallow syntactic sugar: an alet construct that syntactically resembles the let construct, which we have implemented in the GHC Haskell compiler. We describe a proposed axiom for the aﬁx operator. We demonstrate usefulness with examples from Applicative parser combinators and functional reactive programming. We show how higher-order recursive operators like many can be encoded without special library support, unlike previous approaches, and we demon- strate an implementation of the left recursion removal transform. Categories and Subject Descriptors D.3.3 [Language Constructs and Features]: Recursion Keywords Applicative functors, observable recursion, HOAS 1. Introduction Let us start with an embedded domain-speciﬁc language (EDSL) of parser rules, modelled as the GADT (see e.g. [29]) Rule . The data type is parameterised by the type a of parse results: data Rule a where Pure :: a → Rule a Seq :: Rule (a → b ) → Rule a → Rule b Disj :: Rule a → Rule a → Rule a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. PEPM’13, January 21–22, 2013, Rome, Italy. Copyright c  2013 ACM 978-1-4503-1842-6/13/01. . . $15.00 Fail :: Rule a Token :: Char → Rule Char Rule provides DSL primitives Pure (match the empty string, re- turn a ﬁxed result), Seq (sequence two rules, apply the result of the ﬁrst to that of the second), Disj (choose between two rules), Fail and Token (parse and return a speciﬁed character). Rule uses the Applicative parser combinator style introduced and popularised by Swierstra et al. [31]. Readers may recognise Seq and Pure as pure and ⊛ operators of an Applicative functor. With Haskell’s a ‘f ‘ b for fab , the following bs and bs ′ :: Rule String model a language of arbitrary-length sequences of bs: bs =(Pure (:) ‘Seq ‘ Token ’b’ ‘Seq ‘ bs ) ‘Disj ‘ Pure "" bs ′ =(Pure snoc ‘Seq ‘ bs ′ ‘Seq ‘ Token ’b’)‘Disj ‘ Pure "" According to rule bs , matches either start with token ’b’, followed by another match of bs or they are empty. The rule Pure (:) has no parse behaviour of bs , but produces the list cons operator (:) as a result, so that the parsed token ’b’ is consed with the result of the recursive match. For an empty match, the empty string "" is returned. Parser rule bs ′ deﬁnes the same language and parse results but expects the recursive match ﬁrst, and the token ’b’ second. bs ′ is left-recursive: it refers to itself in a left-most position. The algorithm nullable :: Rule a → Bool checks if a rule accepts the empty string: nullable (Pure ) = True nullable (Seq a b ) = nullable a ∧ nullable b nullable (Disj a b )= nullable a ∨ nullable b nullable Fail = False nullable (Token )= False This deﬁnition is satisfactory for ﬁnite rules, but a problem arises for inﬁnite, recursive production rules like bs and bs ′ above. A known problem of parser DSLs like Rule is that left-recursive rules are not treated well. Unlike nullable bs (which is True ), nullable bs ′ is ⊥: computation loops forever. Computationally, nullable bs ′ loops when considering the left-most part of its ﬁrst alternative. Denotationally, nullable bs ′ corresponds to the ﬁx- point of a certain function, and the least ﬁxpoint ⊥ we get from Haskell is not the one we would like. This example shows that for DSLs like our grammar model, algo- rithms need more control of how ﬁxpoints are calculated. As such, it is inappropriate to rely on Haskell’s least ﬁxpoints. Otherwise, parsing libraries are restricted to top-down parsing algorithms, left- recursion is difﬁcult (although some algorithms deal with it any- way, e.g. [16]) and some algorithms are impossible (e.g. print a representation of a rule’s parsing structure). Also for DSLs with re- 97