An INTRODUCTION TO CUDA Programming Irina Mocanu University POLITEHNICA Bucharest irina.mocanu@cs.pub.ro Abstract. The graphics boards have become so powerful that they are usded for mathematical computations, such as matrix multiplication and transposition, which are required for complex visual and physics simulations in computer games. NVIDIA has supported this trend by releasing the CUDA (Compute Unified Device Architecture) interface library to allow applications developers to write code that can be uploaded into an NVIDIA-based card for execution by NVIDIA's massively parallel GPUs. This paper is an introduction to the CUDA programming based on the documentation from [2] and [4]. Introduction The programmable graphics processor unit (GPU) has evolved into an absolute computing workhorse. Today's GPUs offer a lot of resources for both graphics and non-graphics processing. Data- parallel processing maps data elements to parallel processing threads. Many applications that process large data sets such as arrays can use a data-parallel programming model to speed up the computations. In 3D rendering large sets of pixels and vertices are mapped to parallel threads. Similarly, image and media processing applications can map image blocks and pixels to parallel processing threads. A lot of any other algorithms except the image rendering and processing algorithms are accelerated by data-parallel processing. In this scope, Nvidia developed CUDA (Compute Unified Device Architecture) [1], a new hardware and software architecture for issuing and managing computations on the GPU as a data-parallel computing device without the need of mapping them to a graphics API. It is available for the GeForce 8 Series, Quadro FX 5600/4600, and Tesla solutions. The paper presents the principal features of CUDA programming. It is presented the CUDA architecture and the application programming interface in CUDA, based on the documentation from [2] and [4]. Also there is an simple CUDA program for adding two matrix which uses the parallel capabilities of CUDA, which was presented in [3]. Figure 1. The CUDA software stack The CUDA Architecture