Targeting heterogeneous SoCs using MCAPI Peng Sun, Sunita Chandrasekaran and Barbara Chapman Department of Computer Science, University of Houston, Houston, TX, 77004, USA {psun5,sunita,chapman}@cs.uh.edu Abstract—Programming emerging complex embedded systems is a challenge. Embedded applications are complicated enough, hence demanding code reuse and easy adoption. Unfortunately existing software solutions expect programmers to handle most of the low-level details giving rise to a plethora of non-portable proprietary commercial solutions. The need to have industry- standards is becoming more and more critical. The Multicore Association (MCA) offers industry-driven standard-based ap- proaches that provide portable and scalable solutions. In this paper, we use Multicore Communication API (MCAPI), one of the popular APIs used in the embedded industry enabling inter-core communication and synchronization. We have extended the reference MCAPI implementation for a Freescale QorlQ P4080 multicore platform consisting of eight e500mc Power Architecture TM and specialized accelerators such as Pattern Match Engine (PME) and Security Engine (SEC) integrated with Data Path Acceleration Accelerators (DPAA). We establish communication with PME from power cores, using MCAPI, thus abstracting all low-level conﬁgurations and function calls. 1. I NTRODUCTION Heterogeneous architectures can be typically referred to an architecture that consists of a variety of different types of computation units. In embedded systems, the computation units could be a general purpose processor (i.e. CPU, ARM or Power), a special-purpose processor (i.e. DSP or GPU), or even a hardware accelerator (i.e. FPGA or Pattern Matching Engine). In this project, our goal is to design and create a portable programming paradigm for heterogeneous embedded systems. We plan to leverage the recently ratiﬁed accelerator features of the de facto shared memory programming model OpenMP that may serve as a vehicle for productive programming of heterogeneous embedded systems. To begin with, we have studied the industry standards API i.e. MulitCore Association (MCA) API, that speciﬁes essential application-level semantics for communication, synchronization, resource management and task management capabilities among several others. For the task accomplished in this paper, we have explored the multicore communication APIs (MCAPI) that is designed to capture basic communication and synchronization required for closely distributed embedded systems. We have considered Freescale QorlQ P4080 as the target and evaluation platform for this work. We identiﬁed the main challenge which is to establish communication mechanisms between the P4080 mul- ticore processor and the Data Path Acceleration Architecture (DPAA) and Security Engine(SEC 4.0) and Pattern Matching Engine(PME) accelerators. The organization of the paper: 2 discusses MCAPI in detail, Section 3 gives an overview of the target platform we used for this work, Section 4 elaborates the implementation details and Section 6 concludes the paper and also discusses the future work. 2. MCAPI The purpose of MCAPI, which is a message-passing API, is to capture the basic elements of communication and synchro- nization that are required for closely distributed embedded sys- tems. MCAPI provides a limited number of calls with sufﬁcient communication functionalities while keeping it simple enough to allow efﬁcient implementations. Additional functionality can be layered on top of the API set. The calls are exemplifying functionalities and are not mapped to any particular existing implementation. MCAPI deﬁnes three types of communication: • Messages, connectionless diagrams • Package Channel, connection oriented, uni- directional, FIFO package streams • Scalar Channel, connection oriented, single word uni- directional, FIFO package streams MCAPI messages provide a ﬂexible method to transmit data between endpoints without ﬁrst establishing a connection. The buffers on both sender and receiver sides must be provided by the user application. MCAPI messages may be sent with dif- ferent priorities. MCAPI packet channels provide a method to transmit data between endpoints by ﬁrst establishing a connec- tion, thus potentially removing the message header and route discovery overhead. Packet channels are unidirectional and deliver data in a FIFO (ﬁrst in ﬁrst out) manner. The buffers are provided by the MCAPI implementation on the receive side, and by the user application on the send side. MCAPI scalar channels provide a method to transmit scalars very efﬁciently between endpoints by ﬁrst establishing a connection. Like packet channels, scalar channels are unidirectional and deliver data in a FIFO (ﬁrst in ﬁrst out) manner. The scalar functions come in 8-bit, 16-bit, 32-bit and 64-bit variants. The scalar receives must be of the same size as the scalar sends. A mismatch in size results in an error. We use the reference implementation provided by MCA and extend it further to cater to the power processors and its accelerators. 3. TARGET PLATFORM The P4080 development system is a high-performance computing, evaluation, and development platform supporting the P4080 power architecture processor. Figure 1 shows the preliminary block diagram of P4080. A. P4080 Processor The P4080 processor is based upon the e500mc core built on Power Architecture and offering speeds at 1200-1500 MHz.