CGI2015 manuscript No. (will be inserted by the editor) Efficient Grid Construction on Streaming Architectures Vasco Costa · Jo˜ ao M. Pereira · Joaquim A. Jorge Abstract Grid space partitioning is a technique to speed up queries to graphics databases. We present a parallel grid construction algorithm which can effi- ciently construct a structured grid on GPU hardware. Our approach is substantially faster than existing uni- form grid construction algorithms, especially on non- homogeneous scenes. Indeed, it can populate a grid in real-time (at rates over 25 Hz), for architectural scenes with 10 million triangles. Keywords Grids · Space partitioning · Parallel · GPU 1 Introduction Grids are a spatial partitioning scheme that tessellates space into parallelotope cells. Grid subdivision meth- ods are popular because they can speed up graphics algorithms which perform spatial queries. Relevant ap- plications include fluid simulation and visualization, oc- clusion culling, and ray tracing, among others. This work focuses on efficient grid construction algo- rithms for parallel stream processor architectures such as GPUs. There are algorithms that can populate a grid in linear time with the number of objects to be placed in the grid, where the objects can occupy a single grid cell at most. However, for typical polygon meshes, each object can occupy more than one grid cell, which causes performance degradation on parallel architectures. This is due to poor workload distribution among processing threads. In the present paper we describe a structured grid construction algorithm that solves this problem. Our main contributions include: Vasco Costa · Jo˜aoM.Pereira · Joaquim A. Jorge INESC-ID, Instituto Superior T´ ecnico, University of Lisbon – A grid population algorithm that is up to nine times faster than state-of-the-art uniform grid initializa- tion techniques. Our method makes it possible to populate grids for architectural scenes with 10 mil- lion triangles at rates over 25 Hz (Section 3). – A benchmark evaluation of our grid construction algorithm shows performance gains over the state- of-the-art on different test scenes (Sections 4, 5). 2 Related Work Grid spatial partitioning techniques can reduce the num- ber of ray/object intersection queries required for ray casting. Lagae and Dutr´ e described algorithms for compact grid construction on the CPU [3]. Their approach ex- panded on the previous work contributing a GPU algo- rithm for compact grids using atomic operations. How- ever these atomic operations can slow down some par- allel architectures. To overcome this limitation, Kalojanov et al. de- vised algorithms for sorted grid construction on the GPU [2] which do not require atomic operations but instead rely on radix sort of cell id/object id pairs. The radix sort can be computed in O(kN ) linear time on a serial processor where k is a constant that depends on the cell id size in bits. The parallel implementations of both these grid con- struction algorithms have a serial processing component to identify the grid cells overlapped by each object. On arbitrary meshes any given polygon can over- lap a different number of grid cells. For these common cases both algorithms exhibit poor workload distribu- tion among threads. These techniques also spawn one work thread per triangle, which causes further load bal-