Classiﬁcation of Massively Parallel Computer Architectures Muhammad Ali Shami Department of ES Royal Institute of Technology 16440 Kista, Stockholm Email: shami@kth.se Ahmed Hemani Department of ES Royal Institute of Technology 16440 Kista, Stockholm Email: hemani@kth.se Abstract—Faced with slowing performance and energy beneﬁts of technology scaling, VLSI/Computer architectures have turned from parallel to massively parallel machines for personal and embedded applications in the form of multi and many core architectures. Additionally, in the pursuit of ﬁnding the sweet spot between engineering and computational efﬁciency, massively parallel Coarse Grain Reconﬁgurable Architectures(CRGAs) have been researched. While these articles have been surveyed, they have not been rigorously classiﬁed to enable objective differentiation and comparison for performance, area and ﬂex- ibility. In this paper, we extend the well known Skillicorn taxonomy to create new classes, present a scoring system to rate these classes on ﬂexibility, and present equations for early estimation of area and conﬁguration overheads. Furthermore, we use this extended classiﬁcation scheme to classify and compare 25 different massively parallel architectures that covers most of the reported CGRAs and other well known multi and many core architectures. I. I NTRODUCTION Parallelism is becoming mainstream. This is reﬂected in the widespread adoption of multi-processor architectures that powers desktop PCs, mobile phones, video games, graphics and personal high performance scientiﬁc computing that uses general purpose GPUs. The research community is also de- voting increased attention to all aspects of research in parallel computing and programming models as can be seen in ﬁgure 1. The ﬁgure shows the number of publications in different ﬁeld of parallel computing for last 15 years. It can be seen that the research interest in parallel computing specially in multi- core and reconﬁgurable computer architectures has increased signiﬁcantly in the last ﬁve years. The seminal survey on Land- scape of Parallel Computing[1] from Berkeley attests to this trend in increased interest from the academic world. Massively parallel reconﬁgurable computing both ﬁne grain FPGAs and Coarse Grain Reconﬁgurable Architectures(CGRAs) have also been subject of intense research and is ﬁnding widespread adoption in industry. These massively parallel reconﬁgurable computing architectures have also been surveyed by Reiner Hartenstein [2], Raj Krishnamurthy[3] and Max Baron[4]. Survey papers are of great value to the research community, especially new participants, as it provides a single source of information on the most signiﬁcant research and the challenges in a speciﬁc domain. This paper complements the survey papers on reconﬁgurable computing and the more wider survey paper on the parallel computing landscape[1] by providing a classiﬁcation scheme that would enable the community to objectively compare and differentiate present and newer parallel computing architectures, including multi/many core architectures, CGRAs and FPGAs. Flynn[5] classiﬁed computer architectures into four cate- gories SISD, SIMD, MISD and MIMD. This taxonomy is perhaps the oldest, simplest and the most widely known. Skillicorn[6] citing the broadness of Flynn’s taxonomy as a limitation introduced a new way of classifying the computer architectures in a more comprehensive manner. He used Data Processor (DP), Instruction Processor (IP), Data Memory (DM), Instruction Memory (IM) as the building blocks of a computer architectures and classiﬁed them according to the their number (0, 1 or n) and how they are connected one to one or one to many. The limitation in Skillicorn’s taxonomy is the granularity of building blocks. Because of higher granularity, these building blocks cannot exchange their roles. Therefore the number of these elements in a computer architecture will remain ﬁxed (0,1, or n). This limitation restricts the application of Skillicorn’s taxonomy on modern reconﬁgurable architecture(FPGA, CGRAs), where the basic blocks are of ﬁner granularity (gates, LUT, CLBs) and can assume the role of either IP, DP or a memory element. So the number of IP, DP or memory elements in these architecture changes upon reconﬁguration and is variable denoted by symbol ’v’. In this paper, we use Skillicorn’s taxonomy as the starting point and extend it to cover the richer parallel and reconﬁg- urable computing landscape that has emerged since the paper was published in 1988. We also improve the predictive power of this taxonomy by naming the classes according to some rules. We also deﬁne ﬂexibility and give each class a relative ﬂexibility value to compare them against each other. Using our extended classiﬁcation, we also try to predict the area of the target architecture. Finally We apply the Skillicorn’s extended taxonomy on existing parallel and reconﬁgurable architectures like multi/many core architectures, CGRAs and FPGAs. Section-II extends the skillicorn’s original taxonomy to ap- ply on richer parallel and reconﬁgurable computing landscape. Section-III talks about the predictive power of this taxon- omy. Section-IV applies this taxonomy on modern parallel and reconﬁgurable architecture and ﬁnally Section V gives 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops 978-0-7695-4676-6/12 $26.00 © 2012 IEEE DOI 10.1109/IPDPSW.2012.42 337 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum 978-0-7695-4676-6/12 $26.00 © 2012 IEEE DOI 10.1109/IPDPSW.2012.42 337 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum 978-0-7695-4676-6/12 $26.00 © 2012 IEEE DOI 10.1109/IPDPSW.2012.42 344