Composable and Hygienic Typed Syntax Macros * Cyrus Omar Chenglong Wang Jonathan Aldrich Carnegie Mellon University {comar, stwong, aldrich}@cs.cmu.edu ABSTRACT Syntax extension mechanisms are powerful, but reasoning about syntax extensions can be difficult. Recent work on type-specific languages (TSLs) addressed reasoning about composition, hygiene and typing for extensions introducing new literal forms. We supplement TSLs with typed syntax macros (TSMs), which, unlike TSLs, are explicitly invoked to give meaning to delimited segments of arbitrary syntax. To maintain a typing discipline, we describe two flavors of term-level TSMs: synthetic TSMs specify the type of term that they generate, while analytic TSMs can generate terms of arbitrary type, but can only be used in positions where the type is otherwise known. At the level of types, we describe a third flavor of TSM that generates a type of a specified kind along with its TSL and show interesting use cases where the two mechanisms operate in concert. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifica- tions—Extensible languages Keywords extensible syntax; macros; hygiene; type inference 1. INTRODUCTION One way programming languages evolve is by introducing syntactic sugar that captures common idioms more concisely and naturally. In most contemporary languages, this is the responsibility of the language designer. Unfortunately, the designers of general-purpose languages do not have strong incentives to capture idioms that arise only situationally, motivating research into mechanisms that allow the users of a language to extend it with new syntactic sugar themselves. Designing a useful syntax extension mechanism is non- trivial because the designer can no longer comprehensively check that parsing ambiguities cannot arise and that desug- arings are semantically well-behaved. Instead, the extension mechanism must provide several key guarantees: * This paper uses color for clarity of exposition. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. SAC ’15 April 13 - 17 2015, Salamanca, Spain Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-3196-8/15/04 ...$15.00. http://dx.doi.org/10.1145/2695664.2695936 Composability The mechanism cannot simply allow the base language’s syntax to be modified arbitrarily due to the potential for parsing ambiguities, both due to conflicts with the base language and, critically, between extensions (e.g. extensions adding support for XML and HTML). Hygiene The desugaring logic associated with new forms must be constrained to ensure that the meaning of a valid program cannot change simply because some of the variables have been uniformly renamed (manually, or by a refactoring tool). It should also be straightforward to identify the bind- ing site of a variable, even with intervening uses of sugar. These two situations correspond to inadvertent variable cap- ture and shadowing by the desugaring. Typing Discipline In a rich statically typed language, which will be our focus in this work, determining the type a sugared term will have, and analagously the kind a type will have (discussed further below), should be possible without requiring that the desugaring be performed, to aid both the programmer and tools like type-aware code editors. Most prior approaches to syntax extension, discussed in Sec. 5, fail to simultaneously provide all of these guarantees. Recent work on type-specific languages (TSLs) makes these guarantees, but in a limited setting: library providers can define new literal syntax by associating parsing and desug- aring logic with type declarations [13]. Local type inference, specified as a bidirectional type system [15], controls which such TSL is used to parse the bodies of literal forms. The available delimiters are fixed by the language, but the bodies of literal forms can be arbitrary, so TSLs are flexible, and this approach guarantees composability and maintains the typing discipline by construction. The semantics given also guarantees hygiene. We will review in Sec. 2. While many forms of syntactic sugar can be realized as TSLs, there remain situations where TSLs do not suffice: (i) Only a single TSL can be associated with a type, and only when it is declared, so alternate syntactic choices (which are common [17]), or syntax for a type that is not under a programmer’s control, cannot be defined. (ii) Syntax cannot be associated with types that are not identified nominally (e.g. arrow types). (iii) Idioms other than those that arise when introducing a value of a type (e.g. those related to control flow or API protocols) cannot be captured (iv) Types cannot themselves be declared using specialized syntax. Contributions In this paper, we introduce typed syntax macros (TSMs), which supplement TSLs to handle these scenarios while maintaining the crucial guarantees above.