IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 12, NO. 5, SEPTEMBER/OCTOBER 2006 A Generic and Scalable Pipeline for GPU Tetrahedral Grid Rendering Joachim Georgii and R¨ udiger Westermann Abstract— Recent advances in algorithms and graphics hardware have opened the possibility to render tetrahedral grids at interactive rates on commodity PCs. This paper extends on this work in that it presents a direct volume rendering method for such grids which supports both current and upcoming graphics hardware architectures, large and deformable grids, as well as different rendering options. At the core of our method is the idea to perform the sampling of tetrahedral elements along the view rays entirely in local barycentric coordinates. Then, sampling requires minimum GPU memory and texture access operations, and it maps efficiently onto a feed-forward pipeline of multiple stages performing computation and geometry construction. We propose to spawn rendered elements from one single vertex. This makes the method amenable to upcoming Direct3D 10 graphics hardware which allows to create geometry on the GPU. By only modifying the algorithm slightly it can be used to render per-pixel iso-surfaces and to perform tetrahedral cell projection. As our method neither requires any pre-processing nor an intermediate grid representation it can efficiently deal with dynamic and large 3D meshes. Index Terms—Direct volume rendering, unstructured grids, programmable graphics hardware 1 I NTRODUCTION AND MOTIVATION Although recent advances in graphics hardware have opened the pos- sibility to efficiently render tetrahedral grids on commodity PCs, inter- active rendering of large and deformable grids is still one of the main challenges in scientific visualization. Such grids are more and more frequently encountered in a number of different applications ranging from plastic and reconstructive surgery, virtual training simulators to fluid and solid mechanics. The weakness of GPU-based volume rendering techniques for tetra- hedral grids is, that these techniques do not effectively exploit the po- tential of recent GPUs. The reason therefore lies in the re-sampling process for tetrahedral elements. This process requires at every sam- ple point the geometry of the element it is contained in. The geometry is used to compute the points position in the local coordinate space of the element. Most generally, an element matrix built from the elements vertex coordinates is used for this purpose. For every element this matrix only has to be computed once and can then be used to re-sample the data at every sample point in its inte- rior. To do so, a container storing the matrices of all elements has to be created on the GPU. It is clear that this approach significantly in- creases the memory requirements. Moreover, because the re-sampling is performed in the fragment stage, every fragment needs to be as- signed the unique identifier of the element it is contained in to address the respective matrix. In scan-conversion algorithms this can only be done by issuing these identifiers as additional per-vertex attributes in the rendering of the tetrahedral elements. Unfortunately, because ev- ery vertex is shared by many elements in general, a shared vertex list can no longer be used to represent the grid geometry on the GPU. This causes an additional increase in memory. To avoid the memory overhead induced by pre-computations, element matrices can be calculated in turn for every sample point. But then the same computations, including multiple memory access operations to fetch the respective coordinates, have to be performed for all sample points in the interior of a single element, thereby wasting a signifi- cant portion of the GPUs compute power. As before, identifiers are Joachim Georgii, E-mail: georgii@in.tum.de. udiger Westermann , E-mail: westermann@in.tum.de. All authors are with the Computer Graphics & Visualization Group, Technische Universit¨ at M¨ unchen Manuscript received 31 March 2006; accepted 1 August 2006; posted online 6 November 2006. For information on obtaining reprints of this article, please send e-mail to: tvcg@computer.org. required to access vertex coordinates, and thus a shared vertex array cannot be used. 1.1 Contribution In this paper we present a GPU pipeline for the rendering of tetrahe- dral grids that avoids the aforementioned drawbacks. This pipeline is scalable with respect to both large data sets as well as future graphics hardware. The proposed method has the following properties: Per-element calculations are performed only once. Tetrahedral vertices and attributes can be shared in vertex and attribute arrays. Besides the shared vertex and attribute arrays nearly no addi- tional memory is required on the GPU. Re-sampling of (deforming) tetrahedral elements is performed using a minimal memory footprint. 1.2 System Overview To achieve our goal we propose a generic and scalable GPU rendering pipeline for tetrahedral elements. This pipeline is illustrated in Figure 1. It consists of multiple stages performing element assembly, primi- tive construction, rasterization and per-fragment operations. Fig. 1. Overview of the GPU rendering pipeline. To render a tetrahedral element the pipeline is fed with one single ver- tex, which carries all information necessary to assemble the element geometry on the GPU. This stage is described in Section 3.1. As- sembled geometry is then passed to the construction stage where a renderable representation is built. The construction stage is explicitly designed to account for the func- tionality on upcoming graphics hardware. With Direct3D 10 compli- ant hardware and geometry shaders [1] it will be possible to create additional geometry on the graphics subsystem. In particular, trian- gle strips or fans composed of several vertices, each of which can be 1345 1077-2626/06/$20.00 © 2006 IEEE Published by the IEEE Computer Society