Accelerated Execution via Eager Release of Dependencies in Task-based Workflows The International Journal of High Perfor- mance Computing Applications XX(X):117 ©The Author(s) 2021 Reprints and permission: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/ToBeAssigned www.sagepub.com/ SAGE Hatem Elshazly 1 and Francesc Lordan 1 and Jorge Ejarque 1 and Rosa M. Badia 1 Abstract Task-based programming models offer a flexible way to express the unstructured parallelism patterns of nowadays complex applications. This expressive capability is required to achieve maximum possible performance for applications that are executed in distributed execution platforms. In current task-based workflows, tasks are launched for execution when their data dependencies are satisfied. However, even though the data dependencies of a certain task might have been already produced, the execution of this task will be delayed until its predecessor tasks completely finish their execution. As a consequence of this approach of releasing dependencies, the amount of parallelism inherent in applications is limited and performance improvement opportunities are wasted. To mitigate this limitation, we propose an eager approach for releasing data dependencies. Following this approach, the execution of tasks will not be delayed until their predecessor tasks completely finish their execution, instead, tasks will be launched for execution as soon as their data requirements are available. Hence, more parallelism is exposed and applications can achieve higher levels of performance by overlapping the execution of tasks. Towards achieving this goal, in this paper we propose applying two changes to task-based workflow systems. First, modifying the dependency relationships of tasks to be specified not only in terms of predecessor and successor tasks but also in terms of the data that caused these dependencies. Second, triggering the release of dependencies as soon as a predecessor task generates the output data instead of having to wait until the end of the predecessor execution to release all of its dependencies. We realize this proposal using PyCOMPSs: a task-based programming model for parallelizing Python applications. Our experiments show that using an eager approach for releasing dependencies achieves more than 50% performance improvement in the total execution time as compared to the default approach of releasing dependencies. Keywords Task-based Workflows, Partial Dependencies, Lazy Dependency Release, Eager Dependency Release, High Performance Computing, Parallel Programming, Distributed Execution 1 Introduction The rapid increase in computational power goes side by side with an increasing complexity in application domains. In fields of science and engineering (e.g. computational biology, molecular dynamics, mechanical turbines simulation, etc.), it is necessary to solve complex problems that exhibit irregular patterns. These patterns are characterized by their complex computation flows, access patterns and execution branches. Such problems are usually represented by complex data structures such as trees and graphs. This increasing complexity in problem and solution domains calls for parallel programming models that are able to exploit the unstructured patterns of applications and, at the same time, hide the complexity of the underlying execution platform. Task-based programming models allow for a flexible approach to express irregular parallelism as opposed to programming models that follow a specific parallel paradigm such as Map-Reduce Dean and Ghemawat (2008) and its alternative Spark Zaharia et al. (2016). Other parallel programming models such as MPI Gropp et al. (1999) and OpenMP Dagum and Menon (1998) are widely used. However, gaining performance using these models requires certain programming expertise. In addition to that, it exposes the details of the underlying execution infrastructure which could compromise the programmability of applications. Using a task-based programming model, applications are decomposed into tasks. These tasks are organized in the form of a Directed Acyclic Graph (DAG) by detecting data dependencies between them so that each task has predecessor(s) and successor(s). Data dependencies between tasks control the scheduling of tasks and their execution. Tasks are launched for execution if they are dependency- free, i.e. all their predecessors have finished their execution successfully. 1 Barcelona Supercomputing Center, Barcelona, Spain Corresponding author: Hatem Elshazly, Barcelona Supercomputing Center (BSC), C/ Jordi Girona, 31, 08034 Barcelona, Spain. Email: hatem.elshazly@bsc.es Prepared using sagej.cls [Version: 2017/01/17 v1.20]