NEW ALGORITHMS FOR GPU STREAM COMPACTION A Comparative Study Pedro Miguel Moreira 1,2 , Luís Paulo Reis 2,3 and A. Augusto de Sousa 2,4 1 ESTG-IPVC, Instituto Politécnico de Viana do Castelo, Viana do Castelo, Portugal 2 DEI/FEUP, Faculdade de Engenharia da Universidade do Porto, Porto, Portugal 3 LIACC, Laboratório de Inteligência Artificial e Ciência de Computadores, Porto,Portugal 4 INESC-Porto, Instituto de Engenharia de Sistemas e Computadores do Porto, Portugal Keywords: Stream Compaction, Parallel Algorithms, Parallel Processing, Graphics Hardware. Abstract: With the advent of GPU programmability, many applications have transferred computational intensive tasks into it. Some of them compute intermediate data comprised by a mixture of relevant and irrelevant elements in respect to further processing tasks. Hence, the ability to discard irrelevant data and preserve the relevant por- tion is a desired feature, with benefits on further computational effort, memory and communication bandwidth. Parallel stream compaction is an operation that, given a discriminator, is able to output the valid elements dis- carding the rest. In this paper we contribute two original algorithms for parallel stream compaction on the GPU. We tested and compared our proposals with state-of-art algorithms against different data-sets. Results demonstrate that our proposals can outperform prior algorithms. Result analysis also demonstrate that there is not a best algorithm for all data distributions and that such optimal setting is difficult to be achieved without prior knowledge of the data characteristics. 1 INTRODUCTION Graphics Processing Units (GPUs) are parallel plat- forms which provide high computational power with very large memory bandwidth at low cost. With the advent of GPU’s programmability, many algorithms, that usually were performed by the CPU, were en- abled to run at the GPU side taking advantage from its parallel processing capabilities. Nowadays, GPUs are compelling programmable platforms, not only un- der the graphics domain, but also for general pur- pose computational intensive tasks, leading to a rel- atively new research area focused on mapping gen- eral purpose computation to graphics processing units - GPGPU (Owens et al., 2007; GPGPU, 2008). Stream compaction, also designated as stream non-uniform reduction and also as stream filtering, takes a data stream as input, uses a discriminator to se- lect a wanted subset of elements, and outputs a com- pacted stream of the selected elements, discarding the rest. Several computer graphics applications, making use of the GPU programmable architecture, may take advantage from parallel stream compaction al- gorithms in several ways. Key benefits, enabled by exclusion of non-relevant data, comprise: sav- ings on computational effort on further processing stages; better memory footprint; and savings on band- width when data has to be readback to the CPU. Stream compaction is also a fundamental compo- nent on algorithms that deal with data partitioning (e.g. some sorting algorithms and space hierarchies). Reported work taking advantage from GPU stream compaction include: collision detection (Horn, 2005; Greß et al., 2006), ray-tracing (Roger et al., 2007), shadow mapping (Lefohn et al., 2007), point list gen- eration (Ziegler et al., 2006), and, in general, algo- rithms that make use of data partitioning. The current OpenGL specification (OpenGL v.2.1) (Segal and Akeley, 2006) exposes two GPU programmable units: the vertex and the fragment pro- cessors. A third programmable unit, the geometry processor, was recently exposed through OpenGL ex- tensions (OpenGL Architecture Review Board, 2008) but it has limited support and availability only on very recent GPUs. Current GPUs are designed with several vertex and fragment processor units enabling them with high levels of parallelism. The vertex processor has scatter capabilities (in- direct writing) and also gather capabilities (indirect reading) with the latter with some lack of support and performance issues. The fragment processor has only gather capabilities, through texture fetching. More recently, the so-called Unified Architec- 119 Moreira P., Reis L. and de Sousa A. (2009). NEW ALGORITHMS FOR GPU STREAM COMPACTION - A Comparative Study. In Proceedings of the Fourth International Conference on Computer Graphics Theory and Applications, pages 119-128 DOI: 10.5220/0001783601190128 Copyright c SciTePress