Multi-port abstraction layer for FPGA intensive memory exploitation applications M. Vanegas a,b, * , M. Tomasi a , J. Díaz a , E. Ros a a Department of Computer Architecture and Technology, University of Granada, 18071 Granada, Spain b Microelectronic Group, Pontificia Bolivariana University, Medellín, Colombia article info Article history: Received 1 September 2009 Received in revised form 17 April 2010 Accepted 9 May 2010 Available online 19 May 2010 Keywords: Memory-control unit FPGA Video processing Hardware design Real-time processing abstract We describe an efficient, high-level abstraction, multi-port memory-control unit (MCU) capable of pro- viding data at maximum throughput. This MCU has been developed to take full advantage of FPGA par- allelism. Multiple parallel processing entities are possible in modern FPGA devices, but this parallelism is lost when they try to access external memories. To address the problem of multiple entities accessing shared data we propose an architecture with multiple abstract access ports (AAPs) to access one external memory. Bearing in mind that hardware designs in FPGA technology are generally slower than memory chips, it is feasible to build a memory access scheduler by using a suitable arbitration scheme based on a fast memory controller with AAPs running at slower frequencies. In this way, multiple processing units connected through the AAPs can make memory transactions at their slower frequencies and the memory access scheduler can serve all these transactions at the same time by taking full advantage of the memory bandwidth. Ó 2010 Elsevier B.V. All rights reserved. 1. Motivation In recent years FPGA technology has evolved from being a val- idation framework to a computing platform. Given that the perfor- mance gap between FPGAs and ASICs has been significantly reduced [1], over the last decade ASICs have been replaced by FPGAs in some electronic industries; in the networking field, for in- stance, routers have FPGAs incorporated into their circuitry to min- imize the time to market and related costs. FPGAs are currently being used in the field of system-on-chip design (SoC) [2,3] be- cause they can now offer sufficient resources, even in some cases on-chip hardcore processors. Modern FPGA devices allow massive parallel on-chip computing through deep-pipelined data-paths with large numbers of super-scalar processing units [4,5]. Further- more, many processing tasks are executed, in a fixed pattern, for a lot of data and their implementation is thus conducive to making the most of the capacity for parallelism offered by FPGAs. Unfortu- nately, the use of FPGAs requires more advanced hardware design skills to achieve complex systems than those needed to make the same system using GPU-based platforms. The computing platform’s performance is quite sensitive to the behavior and limitations of the memory system. Processors have traditionally used memory hierarchy schemes, in which small memories with faster access times are located close to the proces- sors whereas larger capacity memories with slower access times are located far away from the processors [6]. As a matter of fact, data are moved from larger memories to the smaller ones based on spatial and temporal data locality principles [7,8]. This allows the processors faster memory access. Although these principles work well for most algorithms, if an irregular data access is re- quired, the system’s performance will probably be significantly de- graded. In fact, code optimization techniques are highly dependent on data structure [9]. As an application example, in real-time video processing systems access to all information and temporary results arrive at a bottleneck; furthermore, the use of a compression mod- ule does not entail an increase in system performance since access to compressed data is usually data-dependent and an irregular memory access must be used. Multiple parallel processing entities are possible in current FPGA devices [10], but this parallelism is forfeited when they try to access external memories. From now on in this paper, the term ‘‘external memories” will refer to all the memory chips connected to the FPGA. The inherent sequential behavior of the external memories may limit the system’s performance. Therefore, this potential bot- tleneck must be efficiently dealt with in high-performance systems. Access to external memories must be implemented in specific time windows when implementing massive parallel data-paths (with fine-pipelined processing structures) to avoid data collisions [11]. This task is critical and it would be useful to abstract the memory access to facilitate the design of multiple parallel entities with intensive external memory access requirements. We describe here a generic memory-control architecture designed specifically for reconfigurable hardware (FPGA devices) to be used in embedded 1383-7621/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2010.05.007 * Corresponding author at: Department of Computer Architecture and Techno- logy, University of Granada, 18071 Granada, Spain. Tel.: +34 607195837. E-mail addresses: mvanegas@atc.ugr.es (M. Vanegas), mtomasi@atc.ugr.es (M. Tomasi), jdiaz@atc.ugr.es (J. Díaz), eduardo@atc.ugr.es (E. Ros). Journal of Systems Architecture 56 (2010) 442–451 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc