A Performance Model for X10 Applications What’s going on under the hood? David Grove Olivier Tardieu David Cunningham Ben Herta Igor Peshansky Vijay Saraswat IBM Research groved,tardieu,dcunnin,bherta,igorp,vsaraswa@us.ibm.com Abstract To reliably write high performance code in any programming lan- guage, an application programmer must have some understand- ing of the performance characteristics of the language’s core con- structs. We call this understanding a performance model for the language. Some aspects of a performance model are fundamental to the programming language and are expected to be true for any plausible implementation of the language. Other aspects are less fundamental and merely represent design choices made in a partic- ular version of the language’s implementation. In this paper we present a basic performance model for the X10 programming language. We first describe some performance char- acteristics that we believe will be generally true of any implementa- tion of the X10 2.2 language specification. We then discuss selected aspects of our implementations of X10 2.2 that have significant im- plications for the performance model. 1. Introduction Programmers need an intuitive understanding of the performance characteristics of the core constructs of their programming lan- guage to be able to write applications with predictable perfor- mance. We will call this understanding a performance model for the language. Desirable characteristics of a performance model in- clude simplicity, predictive ability, and stability across different im- plementations of the language. The performance model should ab- stract away all non-essential details of the language and its imple- mentation, while still enabling reasoning about those details that do have significant performance impact. Languages with straight- forward mappings of language constructs to machine instructions usually have fairly straightforward performance models. As the de- gree of abstraction provided by the language’s constructs and/or the sophistication of its implementation increase, its performance model also tends to become more complex. In this paper, we describe a preliminary performance model for the X10 programming language. X10 is an object-oriented lan- guage designed specifically to enable the productive programming of multi-core and multi-node computers. In addition to the expected core language features of any modern object-oriented language, it contains additional constructs for expressing fine-grained concur- rency and distributed computation. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. X10 Workshop 2011 date, City. Copyright c 2011 ACM [to be supplied]. . . $10.00 Although the rate of change of the X10 language has signif- icantly decreased from earlier stages of the project, the language specification and its implementations are still immature when com- pared to languages such as C++ and Java. As such, we consider some aspects of the X10 performance model to still be evolving. Therefore, we break our presentation into two logical sections: as- pects that we believe are fairly fundamental to the language itself and aspects that are more closely tied to specific choices embodied in the X10 2.2 implementations. We begin with a brief review of the X10 language in Section 2. Section 3 discusses those aspects of the performance model that arise from fundamental aspects of the language definition. Sec- tion 4 provides an overview of the X10 2.2 implementations. The second logical section of the performance model is presented si- multaneously with a discussion of some of the central implementa- tion decisions embodied in the X10 2.2 runtime system (Section 5) and compiler (Section 6). 2. Background This background section briefly describes the context for the X10 project and introduces the key programming language concepts that will be discussed in later sections of the paper. A great deal more information can be found online at http://x10-lang.org. In particular, the language specification [8], programmer’s guide [3], and a collection of tutorials and sample programs are available. The genesis of the X10 project was the DARPA High Productiv- ity Computing Systems (HPCS) program. As such, X10 is intended to be a programming language that achieves “Performance and Pro- ductivity at Scale.” The primary hardware platforms being targeted by the language are clusters of multi-core processors linked to- gether into a large scale system via a high-performance network. Therefore, supporting both concurrency and distribution are first class concerns of the programming language design. The language must also support the development and use of reusable application frameworks to increase programmer productivity; this requirement motivates the inclusion of a sophisticated generic type system, clo- sures, and object-oriented language features. Finally, like any new language, to gain acceptance X10 must be able to smoothly inter- operate with existing libraries written in other languages. This last requirement constrains both the design and the implementation of X10 in various ways. A computation in X10 consists of one or more asynchronous activities (light-weight tasks). A new activity is created by the statement async S. To synchronize activities, X10 provides the statement finish S. An activity that executes a finish statement will not execute the statement after the finish until all activities spawned within the finish’s body have terminated. Every activity executes in a single Place (address space). While executing in this place, it may freely access any object that also resides in the place. It may manipulate remote references