A case of study for learning to design SPMD applications efficiently on multicore cluster* Ronal Muresano, Dolores Rexachs and Emilio Luque Computer Architecture and Operating System Department (CAOS) Universitat Autònoma de Barcelona, Barcelona, SPAIN rmuresano@caos.uab.es, dolores.rexachs@uab.es, emilio.luque@uab.es AbstractThe current trend in high performance comput- ing (HPC) is to find clusters composed of multicore nodes. These nodes use a hierarchical communication architecture, which has to be handled carefully by students when they want to improve the parallel performance metrics. For this reason, we have proposed a teaching methodology for students in computational sciences with the aim of developing their SPMD (Single Program Multiple Data) parallel applica- tions efficiently based on theoretical and practical sections. This novel methodology for teaching parallel programming is centered on improving parallel applications written by students through their experiences obtained during classes. Students achieved these improvements in their applications through applying novel strategies in order to manage the imbalances issues generated by the hierarchical communi- cation architecture of multicore clusters. Also, this method- ology allows students to discover how to improve their applications using characterization, mapping and scheduling strategies. Finally, the SPMD applications are selected be- cause they can present imbalance issues due to the different communication links included on multicore clusters and these issues may create an interesting challenges for students when they wish to enhance the performance metrics. In conclusion, applying our teaching methodology, students obtained a significant learning skill designing their SPMD parallel applications. Keywords: Performance Metrics, Multicore, Teaching Models, Methodology for efficient execution, SPMD. 1. Introduccion The inclusion of parallel processing in undergraduate degree has been widely justified and it has been integrated into the curriculum when it has become much easier to use and much more widely available the parallel resources [1][2]. Currently the trend in high performance computing (HPC) is to find clusters composed of multicore node, and the learning process has to be updated to use this new trends. Also, the multicore nodes add heterogeneity levels inside * This research has been supported by the MEC-MICINN Spain under contract TIN2007-64974. * Contact Autor: R. Muresano, rmuresano@caos.uab.es This paper is addressed to the FECS conference. parallel environments and these heterogeneities have to be handled by students carefully when they wish to improve the performance metrics. Such computation and commu- nication heterogeneities in the nodes generate interesting challenges that students of parallel programming courses must be prepared to deal with, when they want to enhance the application performance. Also, the integration of multicore nodes in High Per- formance Computing (HPC) has allowed the inclusion of more parallelism within nodes. However, this parallelism must deal with some troubles present in multicore environ- ments [3]. Problems such as: number of cores per chip, data locality, shared cache, bus interconnection, memory bandwidth, etc., are becoming more important in order to manage the parallel execution efficiently. The increasing use of multicore in HPC can be evidenced in the top500 1 list in which most of the today cluster are set up with multicore nodes. For this reason, students have to learn new parallel programming strategies with the aim of enhancing the performance metrics in these environments. Indeed, the need for students to learn parallel application topics and tools is growing [4]. In fact, parallel application development has been included in different areas such as: biology, physics, engineering, etc [5] . and these inclusions have created the needs to incorporate this important topic in computer science curriculum. Including the efficient management of multicore environment topic into the parallel programming course content is very important because the current trend in computational science is to use parallel computing. However to achieve an efficient execution, the instructor has to manage some issues that student can present when they design their parallel applications [6] [7]. One of the difficulties for students is to change their previous program- ming knowledge, which is focused on designing sequential applications. This focus is totally different when parallel applications are programmed and even more when these applications have to be designed for a multicore cluster. In these order, tasks divisions between cores and the hier- archical communication architecture included on multicore clusters are topics that students have to consider when 1 TOP500 is a list which provides a rank of the parallel machines used for high performance computing www.top500.org