Classification of Massively Parallel Computer Architectures Muhammad Ali Shami Department of ES Royal Institute of Technology 16440 Kista, Stockholm Email: shami@kth.se Ahmed Hemani Department of ES Royal Institute of Technology 16440 Kista, Stockholm Email: hemani@kth.se Abstract—Faced with slowing performance and energy benefits of technology scaling, VLSI/Computer architectures have turned from parallel to massively parallel machines for personal and embedded applications in the form of multi and many core architectures. Additionally, in the pursuit of finding the sweet spot between engineering and computational efficiency, massively parallel Coarse Grain Reconfigurable Architectures(CRGAs) have been researched. While these articles have been surveyed, they have not been rigorously classified to enable objective differentiation and comparison for performance, area and flex- ibility. In this paper, we extend the well known Skillicorn taxonomy to create new classes, present a scoring system to rate these classes on flexibility, and present equations for early estimation of area and configuration overheads. Furthermore, we use this extended classification scheme to classify and compare 25 different massively parallel architectures that covers most of the reported CGRAs and other well known multi and many core architectures. I. I NTRODUCTION Parallelism is becoming mainstream. This is reflected in the widespread adoption of multi-processor architectures that powers desktop PCs, mobile phones, video games, graphics and personal high performance scientific computing that uses general purpose GPUs. The research community is also de- voting increased attention to all aspects of research in parallel computing and programming models as can be seen in figure 1. The figure shows the number of publications in different field of parallel computing for last 15 years. It can be seen that the research interest in parallel computing specially in multi- core and reconfigurable computer architectures has increased significantly in the last five years. The seminal survey on Land- scape of Parallel Computing[1] from Berkeley attests to this trend in increased interest from the academic world. Massively parallel reconfigurable computing both fine grain FPGAs and Coarse Grain Reconfigurable Architectures(CGRAs) have also been subject of intense research and is finding widespread adoption in industry. These massively parallel reconfigurable computing architectures have also been surveyed by Reiner Hartenstein [2], Raj Krishnamurthy[3] and Max Baron[4]. Survey papers are of great value to the research community, especially new participants, as it provides a single source of information on the most significant research and the challenges in a specific domain. This paper complements the survey papers on reconfigurable computing and the more wider survey paper on the parallel computing landscape[1] by providing a classification scheme that would enable the community to objectively compare and differentiate present and newer parallel computing architectures, including multi/many core architectures, CGRAs and FPGAs. Flynn[5] classified computer architectures into four cate- gories SISD, SIMD, MISD and MIMD. This taxonomy is perhaps the oldest, simplest and the most widely known. Skillicorn[6] citing the broadness of Flynn’s taxonomy as a limitation introduced a new way of classifying the computer architectures in a more comprehensive manner. He used Data Processor (DP), Instruction Processor (IP), Data Memory (DM), Instruction Memory (IM) as the building blocks of a computer architectures and classified them according to the their number (0, 1 or n) and how they are connected one to one or one to many. The limitation in Skillicorn’s taxonomy is the granularity of building blocks. Because of higher granularity, these building blocks cannot exchange their roles. Therefore the number of these elements in a computer architecture will remain fixed (0,1, or n). This limitation restricts the application of Skillicorn’s taxonomy on modern reconfigurable architecture(FPGA, CGRAs), where the basic blocks are of finer granularity (gates, LUT, CLBs) and can assume the role of either IP, DP or a memory element. So the number of IP, DP or memory elements in these architecture changes upon reconfiguration and is variable denoted by symbol ’v’. In this paper, we use Skillicorn’s taxonomy as the starting point and extend it to cover the richer parallel and reconfig- urable computing landscape that has emerged since the paper was published in 1988. We also improve the predictive power of this taxonomy by naming the classes according to some rules. We also define flexibility and give each class a relative flexibility value to compare them against each other. Using our extended classification, we also try to predict the area of the target architecture. Finally We apply the Skillicorn’s extended taxonomy on existing parallel and reconfigurable architectures like multi/many core architectures, CGRAs and FPGAs. Section-II extends the skillicorn’s original taxonomy to ap- ply on richer parallel and reconfigurable computing landscape. Section-III talks about the predictive power of this taxon- omy. Section-IV applies this taxonomy on modern parallel and reconfigurable architecture and finally Section V gives 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops 978-0-7695-4676-6/12 $26.00 © 2012 IEEE DOI 10.1109/IPDPSW.2012.42 337 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum 978-0-7695-4676-6/12 $26.00 © 2012 IEEE DOI 10.1109/IPDPSW.2012.42 337 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum 978-0-7695-4676-6/12 $26.00 © 2012 IEEE DOI 10.1109/IPDPSW.2012.42 344