Hierarchical Supporting Structure for Dynamic Organization in Many-core Computing Systems Liang Guang 1 , Syed M. A. H. Jafri 1,2 , Bo Yang 1 , Juha Plosila 1 and Hannu Tenhunen 1,2 1 University of Turku, Turku, Finland 2 Royal Institute of Technology, Stockholm, Sweden Keywords: Many-core Systems, Dynamic Organization, Dependability, Software/Hardware Co-design. Abstract: Hierarchical supporting structures for dynamic organization in many-core computing systems are presented. With profound hardware variations and unpredictable errors, dependability becomes a challenging issue in the emerging many-core systems. To provide fault-tolerance against processor failures or performance degrada- tion, dynamic organization is proposed which allows clusters to be created and updated at the run-time. Hier- archical supporting structures are designed for each level of monitoring agents, to enable the tracing, storing and updating of component and system status. These supporting structures need to follow software/hardware co-design to provide small and scalable overhead, while accommodating the functions of agents on the cor- responding level. This paper presents the architectural design, functional simulation and implementation analysis. The study demonstrates that the proposed structures facilitate the dynamic organization in case of processor failures and incur small area overhead on many-core systems. 1 INTRODUCTION With constant technology scaling, many-core com- puting systems have become a reality (Vangal et al., 2008). By exploiting massive parallelism, such sys- tems are expected to provide much higher theoretical performance than single-core or few-core chips. For instance, TeraFLOPS (Vangal et al., 2008), an 80-core processor, achieves over 1.0 TFLOPS ( 10 12 floating- point operations per second). To provide interconnec- tion in a scalable manner, Network-on-Chip (NoC) is widely adopted as the communication architecture for many-core systems (Jantsch and Tenhunen, 2003; Vangal et al., 2008). In particular, regular network layout (e.g. mesh) with predictable link delay and electrical properties are favoured for general-purpose NoCs (Pamunuwa et al., 2004). Dependability is a major design challenge on many-core systems. While an increasing number of resources can be integrated onto a single die, the fault occurrence is also rising (Shamshiri et al., 2008). For one thing, due to the small feature size, process vari- ation and aging, the probability of permanent and transient faults increases in VLSI chips (Collet et al., 2009). For another, the deviation in the supply volt- age and threshold voltage may lead to longer criti- cal paths and consequent worse performance (Unsal et al., 2006). When certain resources in a many- core system fail, the system should still properly per- form with the remaining resources, in order to pro- vide dependable computing and improve the yield (Shamshiri et al., 2008). Hierarchical agent-based adaptation (H2A) is a systematic and generic approach to achieve self- adaptive parallel computing (Guang et al., 2010). Software and hardware agents are embedded on dif- ferent organization levels, to monitor and reconfig- ure global and local services, including energy man- agement and dependable computing. The top-level agent, platform agent, is responsible for system-level, coarse-grained resource allocation. Regional-level agents, cluster agents, are managing intra-cluster ser- vices, e.g., energy management. In particular, the agents need to enable run-time resource reconfigura- tion in case of component failures. This paper proposes hierarchical supporting struc- tures on agent-based many-core systems, to enable the run-time status tracing, storing and updating. We present the concept of dynamic organization, which allows a cluster to be dynamically created and up- dated in case of permanent failures or performance degradation. In addition, to offer scalability towards 100s-1000s future chips, the structures need to follow software/hardware (SW/HW) co-design techniques to 252 Guang L., M. A. H. Jafri S., Yang B., Plosila J. and Tenhunen H.. Hierarchical Supporting Structure for Dynamic Organization in Many-core Computing Systems. DOI: 10.5220/0004389702520261 In Proceedings of the 3rd International Conference on Pervasive Embedded Computing and Communication Systems (SANES-2013), pages 252-261 ISBN: 978-989-8565-43-3 Copyright c 2013 SCITEPRESS (Science and Technology Publications, Lda.)