Realizing FIFO Communication when Mapping Kahn Process Networks onto the Cell Dmitry Nadezhkin, Sjoerd Meijer, Todor Stefanov, and Ed Deprettere Leiden Institute of Advanced Computer Science Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands {dmitryn,smeijer,stefanov,edd}@liacs.nl Abstract. Kahn Process Networks (KPN) are an appealing model of computation to specify streaming applications. When a KPN has to ex- ecute on a multi-processor platform, a mapping of the KPN model to the execution platform model should mitigate all possible overhead in- troduced by the mismatch between primitives realizing the communica- tion semantics of the two models. In this paper, we consider mappings of KPN specification of streaming applications onto the Cell BE multi- processor execution platform. In particular, we investigate how to realize the FIFO communication of a KPN onto the Cell BE in order to re- duce the synchronization overhead. We present a solution based on token packetization andshow the performance results of five different streaming applications mapped onto the Cell BE. Key words: Models of Computation, Kahn Process Networks, distributed FIFO communication, the Cell BE platform 1 Introduction One of the driving forces that motivated the emergence of multi-processor sys- tems on chip (MPSoCs) originates from the complexity of modern applica- tions [1]. Many applications are specified with complex block diagrams that incorporate multiple algorithms. Such applications are called heterogeneous. The emergence of heterogeneous applications led to the design of heterogeneous MP- SoC architectures which provide improved performance behavior by executing different algorithms, which are part of an application, on optimized/specific pro- cessing components of an MPSoC. However, heterogeneous MPSoCs are very hard to program efficiently, and still it is not very clear how this could be done in a systematic and possibly automated way. It is a common believe that the key to solve the programming problem is to use parallel models of computation (MoC) to specify applications [2]. This is because the structure and executional semantics of parallel MoCs match the structure and executional semantics of MPSoCs, i.e., a parallel MoC consists of tasks that can execute in parallel and an MPSoC consists of processing com- ponents that run in parallel. Nevertheless, in many cases there is a mismatch