Implicitly-Parallel Functional Dataflow for Productive Cloud Programming on Chameleon Scott Krieder, Ioan Raicu Illinois Institute of Technology skrieder@hawk.iit.edu, iraicu@cs.iit.edu Justin Wozniak, Michael Wilde Argonne National Laboratory {wozniak, wilde}@mcs.anl.gov, 1. INTRODUCTION This project explores a programming-model and runtime- environment that addresses the urgent yet vexing problem of simplifying the programming of distributed parallel systems. One solution that makes parallel programming implicit rather than explicit is the dataflow model. Conceived ~35 years ago, it has only recently been made practical through systems such as Dryad and Swift [1]. We believe that we have successfully created a base for an implicitly- parallel functional dataflow programming model, as exemplified by Swift, a workflow language for executing scientific applications. This model has been characterized as a perfect fit for the many-task computing (MTC) paradigm. Some broad application classes that fit the MTC paradigm are workflows, MapReduce, high-throughput computing, and a subset of high-performance computing. MTC emphasizes using many computing resources over short periods of time to accomplish many smaller computational tasks (both dependent and independent), where the primary metrics are measured in seconds. MTC has proven successful in grid computing and supercomputing, but the distriďuted Ŷature of todaǇ’s cloud resources pose many challenges in the efficient support of MTC workloads. This work aims to address the programmability gap between MTC and cloud computing, through an innovative parallel scripting language, Swift, which will enable MTC workloads to efficiently leverage cloud resources. This work will enable a broader class of MTC applications to leverage cloud systems. This project addresses the following research problems:  Supporting diverse cloud instances: (general- purpose & memory-intensive)  Scaling downwards (e.g. increasing the spectrum of applications by decreasing leaf-function granularity)  Scaling upwards (increasing scalability to extreme- scale clouds)  Language interoperability (integrate with many languages and programming-models for performing leaf tasks: C/C++, Fortran; MPI, OpenMP)  Runtime facilities for tracing and debugging large distributed parallel workflows  Evaluation of the programming model on applications in: global crop modeling, cancer detection, glass-state materials, and biophysical dynamics. This work represents a non-traditional community for cloud-systems in general and Chameleon in particular, one that focuses on MTC. There are many advantages to MTC, such as improved programmability, implicit parallelism, and improved fault tolerance, all reasons why applications and researchers in the community have adopted MTC as their programming-model for large- scale applications. Given the popularity of cloud infrastructures, being able to leverage the use of MTC on cloud architectures such as Chameleon, at large scale, is a critical activity towards the acceptance of MTC as a viable programming model for future cloud computing. 2. Systems Software The elasticity of the cloud is well suited to the Swift model: Swift can release and re-oďtaiŶ Ŷodes as the ǁorkfloǁ’s deŵaŶd ǀaries. The aďilitǇ to proǀide a ĐoŵputiŶg ĐoŵŵoŶs to support Đross-institution collaborations without the complexities of local authentication/authorization will pave the way for future collaborations. And the ability to extend campus resources on-demand for critical deadlines is invaluable. 2.1 Swift implicitly parallel functional dataflow language We will integrate the Swift parallel programming system with the Chameleon platform. Swift has been successfully used in many large-scale computing applications to increase productivity in running complex applications. Its dataflow-driven programming model, allows implicit, pervasive parallelism to be harnessed through automated dependency management. 2.2 GeMTC GeMTC [2] is a CUDA based framework for supporting many-task computing workloads on NVIDIA based GPGPU devices. As shown in Figure 1, a NVIDIA GPU is comprised of many Streaming Multiprocessors (SMXs). A SMX contains many warps, and each warp provides 32 concurrent threads of execution. All threads within a warp run in a Single Instruction Multiple Thread (SIMT) fashion. GeMTC schedules independent computations on the GPU