Dataflow programs analysis and optimization using model predictive control techniques: an example of bounded buffer scheduling M. Canale Dipartimento di Automatica e Informatica, Politecnico di Torino Italy S. Casale-Brunet, E. Bezati, M. Mattavelli EPFL SCI-STI-MM ´ Ecole Polytechnique F´ ed´ erale de Lausanne Switzerland J.W. Janneck Department of Computer Science Lund University Sweden Abstract—This paper presents a new approach to buffer dimensioning for dynamic dataflow implementations. A novel transformation applied to the execution trace graph of a dataflow program is introduced in order to generate an event driven sys- tem. It is shown how model predictive control theory techniques can be applied to such a system to analyse the execution space of a dataflow program and to define and to minimize a bounded buffer size configuration that corresponds to a deadlock free execution. Some experimental results obtained using two design examples, i.e. a JPEG and an MPEG HEVC decoder, are reported and compared to the state of the art results in order to show the effectiveness of the introduced approach. I. I NTRODUCTION In several signal processing applications areas, the use of dataflow programs as a mean to describe the algorithm imple- mentation makes up an interesting alternative to the classical sequential programming approach. In particular, it allows a more extensive program execution analysis which paves the way to a rich variety of parallel implementation solutions, see in this regard [1], [2], [3], [4]. The main reasons of these attractive features rely on the fact that dataflow programs are highly analyzable, platform independent and explicitly expose the potential parallelism of the application. Several dataflow computation models are structured as (hierarchical) networks of communicating computational kernels, called actors. As depicted in Fig. 1, actors are connected by directed, loss- less, order preserving point-to-point communication channels, called buffers. Therefore, the flow of data between actors in a dataflow network is fully explicit and data sharing is only allowed by sending data packets, called tokens. This work considers a very general dynamic dataflow Model of Computation (MoC) called Dataflow Process Network (DPN) with firing. A specific property of this MoC is that an actor execution is performed as a sequence of discrete steps (also called firings). During each step an actor can consume a finite number of input tokens, produce a finite number of output tokens, and modify its own internal state if it has any (i.e. defined as state variables and input tokens values). The algorithmic part of a single actor firing is specified inside the so called actions: at each step, according to its state variables and input tokens value, only one action can be fired. The resulting absence of race conditions makes the behaviour of a dataflow program more robust to different execution policies, whether those be truly parallel or imply some interleaving of the individual actor executions. The two most widely used techniques for the optimization of dataflow program executions are the Model Checking [5], [6], [7], [8], [9] and the Execution Trace Graph (ETG) analysis [10], [11]. The model checking approach is mainly based on an abstract simulation of the design and includes several methods for the dynamic analysis. For example, symbolic representa- tions (e.g. Binary Decision Diagrams) and temporal logic (e.g. Linear Time Logic, Computational Tree Logic) mainly focus on answering yes/no questions that turn performance evalua- tion into a tedious process. The main disadvantage of the model checking approach is in the fact that often there is no way for searching trade-offs between accuracy and scalability. In fact, when dealing with high dimensional applications, the design space exploration can become easily unfeasible to explore (e.g. also unreachable states are considered) and the designer is left without a partial estimation of the system performance. On the other hand, the methods based on the ETG typically deal with the analysis of a graph data structure, which is obtained after an high level and architecture independent simulation. Therefore, the entire computation can be seen as a collection of nodes that represent the execution steps and a collection of directed arcs that represent the functional dependencies. In this case, analysis methods are based on the processing of a directed acyclic graph. As a consequence, the design space can be efficiently explored using appropriate heuristics and reduced according to the optimization needs without impacting the accuracy of results. The novelty introduced in this work is represented by the transformation of the ETG into a discrete event system, i.e. a Petri Net (PN). Such a transformation leads to the possibility of exploring the design space by using some of the well known system control strategies in order to optimize the mapping and partitioning of a dataflow application. In particular, it is clarified how such transformation can be efficiently applied also to complex designs and very large execution trace graphs. The evaluation of a bounded memory and deadlock free buffer size configuration of a dataflow program is used as context for showing the powerfulness of this approach, since it is one of the classical and most challenging problem in the domain of dataflow applications. Yet, in general, overall buffer minimization/dimensioning remains an important optimization objective for achieving cost effective implementations on embedded processing systems that may suffer from severe U.S. Government work not protected by U.S. copyright