1 MATLAB For Signal Processing On Multi-Processors and Multi-Cores Siddharth Samsi, Vijay Gadepally, and Ashok Krishnamurthy Abstract—MATLAB R is the de facto language of choice for algorithm development in signal and image processing. While traditionally this is done using sequential MATLAB running on desktop systems, recent years have seen a surge of interest in running MATLAB in parallel to take advantage of multi- processor and multi-core systems. In this paper, we discuss three variations of multi-processor parallel MATLAB, two of which are available as commercial, supported products. We also consider running MATLAB with key computations speeded up using multi-threaded computations on multi-core GPGPUs. Two signal processing kernels (FFT and convolution) and two full applications (SAR imaging and Superconducting Quantum Interference Devices) are used to illustrate the use of parallel MATLAB. Index Terms—MATLAB, GPGPU, Parallelization, Supercom- puting, CUDA, Mex I. I NTRODUCTION D EVELOPMENTS in microprocessor technologies have resulted in most processors having multiple computing cores in a single chip. As a result, todays distributed memory high performance computers (HPCs) have multiple CPUs (2- 4) in each node, with each CPU having multiple cores (2-8). The typical programming methodology for such distributed memory HPCS is using some form of a message passing paradigm, typically MPI. On the other hand, General Purpose Graphical Processing Units (GPGPUs or GPUs) are emerging as an alternative architecture for many computational intensive tasks, including signal processing. GPGPUs have large number of computing cores (128-1000) and are typically programmed using threads. Parallel MATLAB has been actively developed over the past several years, and there are several commercial and academic versions available [1], [2], [3], [4], [5], [6]. Using MATLAB with GPGPUs is a relatively recent devel- opment, and the products are not as well developed. The options for multi-core GPGPUs are: (a) create and compile CUDA based mex functions [7] or (b) use MATLAB add-ons such as Jacket [8] or GPGPUmat [9] which accelerate selected MATLAB functions. Signal processing algorithm developers who use MATLAB need to know the different options and tradeoffs to stay productive. In this paper, we walk the reader through the different multi-processor MATLAB choices : (a) Parallel Computing Toolbox (PCT) and the MATLAB Distributed Computing Server (MDCS) [10], (b) StarP from Interactive Supercomput- ing Inc. [11]; and (c) pMATLAB/bcMPI from MIT Lincoln Laboratories/Ohio Supercomputer Center [12],[13]. We then S. Samsi, V. Gadepally and A. Krishnamurthy are with the Ohio Super- computer Center, Columbus, Ohio 43212 look at different multi-core MATLAB choices for (a) CUDA based mex functions (b) MATLAB add-ons. For each of these technologies, we compare individual programming effort and performance improvements observed with popular signal processing kernels and applications. The main message for the reader is that it is possible to exploit todays multi-core and multi-processor systems to effectively simulate signal pro- cessing problems that are large in memory and/or computation requirements, while staying in the comfortable environment of MATLAB. The required changes to sequential MATLAB code are usually quite small, and can be performed with ease. As multi-core and multi-processor implementations have been carried out on different systems, and used for different problem sizes, the results are not compared directly. II. MULTI - THREADING IN MATLAB The simplest approach to leveraging multiple processor cores in MATLAB is through the use of multithreading. Since MATLAB supports multi-threading natively[14], this approach is a simple, non-intrusive way to leverage multiple cores on a system. This type of multi-threading can be broadly compared to the OpenMP[15],[16] approach to parallelism. The built-in multi-threading in MATLAB does not require any intervention on the part of the user and is enabled by default. However, the maximum number of parallel threads cannot exceed the number of cores available on the system. The performance gain obtained by using multiple cores on a single system are also limited and vary based on the specific computation as well as the data size. Fig. 1 illustrates this point. On a 16 core system, a maximum speedup of slightly over 7 was seen for the multiplication and sqrt operations. Conversely, the trigonometric function sin() has a speedup of slightly under 3. This test was performed on a 4-socket Quad core AMD system with 64 GB of RAM running Red Hat Enterprise Linux. While multi-threaded computations are the easiest entry into parallel computing with MATLAB, performance gains are usually limited. This approach should only be viewed as a first step in improving the code performance. III. MULTI -PROCESSOR MATLAB The most common approach to overcoming the performance limitations of sequential MATLAB involves distributing an application over multiple nodes of a commodity cluster. Typ- ical performance limitations for sequential MATLAB can be broadly classified into two areas: capacity and capability. The problem of capacity manifests itself as the inability for existing