Invited paper: Parallel programming and Run-time Resource management Framework for Many-core Platforms: The 2PARMA Approach C. Silvano 1 , W. Fornaciari 1 , S. Crespi Reghizzi 1 , G. Agosta 1 , G. Palermo 1 , V. Zaccaria 1 , P. Bellasi 1 , F. Castro 1 , S. Corbetta 1 , E. Speziale 1 , D. Melpignano 2 , J.M. Zins 2 , H. H¨ ubert 3 , B. Stabernack 3 , J. Brandenburg 3 , M. Palkovic 4 , P. Raghavan 4 , C. Ykman-Couvreur 4 , I. Anagnostopoulos 5 , A. Bartzas 5 , D. Soudris 5 , T. Kempf 6 , G. Ascheid 6 , J. Ansari 6 , P. M¨ ah¨ onen 6 , B. Vanthournout 7 1 DEI Politecnico di Milano, Italy, 2 STMicroelectronics, France, 3 Fraunhofer HHI, Germany, 4 IMEC vzw, Belgium and IBBT, Belgium, 5 ICCS National Tech. University of Athens, Greece, 6 RWTH Aachen University, Germany, 7 Synopsys, Belgium Abstract—Real-time applications, hard or soft, are raising the challenge of unpredictability. This is an extremely difficult problem in the context of modern, dynamic, multiprocessor platforms which, while providing potentially high performance, make the task of timing prediction extremely difficult. Also, with the growing software content in embedded systems and the diffusion of highly programmable and re-configurable platforms, software is given an unprecedented degree of control on resource utilization. The 2PARMA project aims at overcoming the lack of parallel programming models and run-time resource manage- ment techniques to exploit the features of many-core processor architectures. The main goals of the 2PARMA project are: the definition of a parallel programming model combining component-based and single-instruction multiple-thread approaches, instruction set virtualisation based on portable byte-code, run-time resource management policies and mechanisms as well as design space ex- ploration methodologies for Many-core computing architectures. I. I NTRODUCTION The current trend in computing architectures is to replace complex super-scalar architectures with many processing units connected by an on-chip network able to accommodate such a high number of cores, satisfying the needs for communication and data transfers. This trend is mostly dictated by inherent silicon technology frontiers, which are getting as closer as the process densities levels increase. The number of cores to be integrated in a single chip is expected to continue to rapidly increase in the coming years, moving from Multi-core to Many-core architectures. This trend will require a global rethinking of software and hardware approaches. Multi-core architectures are nowadays prevalent in general purpose computing and in high performance computing. In addition to dual- and quad-core general purpose processors, more scalable multi-core architectures are widely adopted for high-end graphics and media processing. Such platforms are becoming widespread as silicon technology develops in the This work is supported by the E.C. funded FP7-248716 2PARMA Project, www.2parma.eu sub-50nm nodes. The transition to multi-core is almost a forced choice to escape the silicon efficiency crisis caused by the looming power wall, the application complexity increase and the design complexity gap under tightening time-to- market constraints. While multi-core architectures are com- mon in general-purpose and domain-specific computing, there is no one-size-fits-all solution. General-purpose multi-cores are still designed to deliver outstanding single-thread per- formance under very general conditions in terms of work- load mix, memory footprint, runtime environment and legacy code compatibility. These requirements lead to architectures featuring a few complex, high-clock speed mega-cores with complex instruction sets, deep pipelines, non-blocking multi- level caches with hardware-supported coherency and advanced virtualization support. Today, we see a trend towards many- core fabrics, with a throughput oriented memory hierarchy featuring software-controlled local memories, FIFOs and spe- cialized DMA engines. As a result, an SoC platform today is a highly heterogeneous system. It integrates a general-purpose multi-core CPU, and a number of domain-specific many-core subsystems. Examples of such emerging multi-core platforms are the Intels SCC [1] and STs Platform 2012 [2]. System-level design and optimization of computing systems is a highly challenging task. Especially since such systems are becoming more and more complex, from both hardware as well as software perspectives [3]. Over the last few years, the main focus in the design of computing systems has been to provide good performance and at the same time achieve low-power consumption. To achieve optimal results, a good coordination between hardware and software design is required. Therefore, memory-intensive applications running on embedded platforms (e.g., multimedia) must be closely linked to the underlying Operating System (OS) and efficiently utilize the available hardware resources. Putting all this together, it is clear that developing a complete, working system is an integration nightmare [3]. The 2PARMA project focuses on the design of a class of