Dataﬂow programs analysis and optimization using model predictive control techniques: an example of bounded buffer scheduling M. Canale Dipartimento di Automatica e Informatica, Politecnico di Torino Italy S. Casale-Brunet, E. Bezati, M. Mattavelli EPFL SCI-STI-MM ´ Ecole Polytechnique F´ ed´ erale de Lausanne Switzerland J.W. Janneck Department of Computer Science Lund University Sweden Abstract—This paper presents a new approach to buffer dimensioning for dynamic dataﬂow implementations. A novel transformation applied to the execution trace graph of a dataﬂow program is introduced in order to generate an event driven sys- tem. It is shown how model predictive control theory techniques can be applied to such a system to analyse the execution space of a dataﬂow program and to deﬁne and to minimize a bounded buffer size conﬁguration that corresponds to a deadlock free execution. Some experimental results obtained using two design examples, i.e. a JPEG and an MPEG HEVC decoder, are reported and compared to the state of the art results in order to show the effectiveness of the introduced approach. I. I NTRODUCTION In several signal processing applications areas, the use of dataﬂow programs as a mean to describe the algorithm imple- mentation makes up an interesting alternative to the classical sequential programming approach. In particular, it allows a more extensive program execution analysis which paves the way to a rich variety of parallel implementation solutions, see in this regard [1], [2], [3], [4]. The main reasons of these attractive features rely on the fact that dataﬂow programs are highly analyzable, platform independent and explicitly expose the potential parallelism of the application. Several dataﬂow computation models are structured as (hierarchical) networks of communicating computational kernels, called actors. As depicted in Fig. 1, actors are connected by directed, loss- less, order preserving point-to-point communication channels, called buffers. Therefore, the ﬂow of data between actors in a dataﬂow network is fully explicit and data sharing is only allowed by sending data packets, called tokens. This work considers a very general dynamic dataﬂow Model of Computation (MoC) called Dataﬂow Process Network (DPN) with ﬁring. A speciﬁc property of this MoC is that an actor execution is performed as a sequence of discrete steps (also called ﬁrings). During each step an actor can consume a ﬁnite number of input tokens, produce a ﬁnite number of output tokens, and modify its own internal state if it has any (i.e. deﬁned as state variables and input tokens values). The algorithmic part of a single actor ﬁring is speciﬁed inside the so called actions: at each step, according to its state variables and input tokens value, only one action can be ﬁred. The resulting absence of race conditions makes the behaviour of a dataﬂow program more robust to different execution policies, whether those be truly parallel or imply some interleaving of the individual actor executions. The two most widely used techniques for the optimization of dataﬂow program executions are the Model Checking [5], [6], [7], [8], [9] and the Execution Trace Graph (ETG) analysis [10], [11]. The model checking approach is mainly based on an abstract simulation of the design and includes several methods for the dynamic analysis. For example, symbolic representa- tions (e.g. Binary Decision Diagrams) and temporal logic (e.g. Linear Time Logic, Computational Tree Logic) mainly focus on answering yes/no questions that turn performance evalua- tion into a tedious process. The main disadvantage of the model checking approach is in the fact that often there is no way for searching trade-offs between accuracy and scalability. In fact, when dealing with high dimensional applications, the design space exploration can become easily unfeasible to explore (e.g. also unreachable states are considered) and the designer is left without a partial estimation of the system performance. On the other hand, the methods based on the ETG typically deal with the analysis of a graph data structure, which is obtained after an high level and architecture independent simulation. Therefore, the entire computation can be seen as a collection of nodes that represent the execution steps and a collection of directed arcs that represent the functional dependencies. In this case, analysis methods are based on the processing of a directed acyclic graph. As a consequence, the design space can be efﬁciently explored using appropriate heuristics and reduced according to the optimization needs without impacting the accuracy of results. The novelty introduced in this work is represented by the transformation of the ETG into a discrete event system, i.e. a Petri Net (PN). Such a transformation leads to the possibility of exploring the design space by using some of the well known system control strategies in order to optimize the mapping and partitioning of a dataﬂow application. In particular, it is clariﬁed how such transformation can be efﬁciently applied also to complex designs and very large execution trace graphs. The evaluation of a bounded memory and deadlock free buffer size conﬁguration of a dataﬂow program is used as context for showing the powerfulness of this approach, since it is one of the classical and most challenging problem in the domain of dataﬂow applications. Yet, in general, overall buffer minimization/dimensioning remains an important optimization objective for achieving cost effective implementations on embedded processing systems that may suffer from severe U.S. Government work not protected by U.S. copyright