Optimized Data Transfer for Time-dependent, GPU-based Glyphs S. Grottel, G. Reina, and T. Ertl Institute for Visualization and Interactive Systems, Universit¨ at Stuttgart ABSTRACT Particle-based simulations are a popular tool for researchers in var- ious sciences. In combination with the availability of ever larger COTS clusters and the consequently increasing number of simu- lated particles the resulting datasets pose a challenge for real-time visualization. Additionally the semantic density of the particles ex- ceeds the possibilities of basic glyphs, like splats or spheres and results in dataset sizes larger by at least an order of magnitude. Interactive visualization on common workstations requires a care- ful optimization of the data management, especially of the trans- fer between CPU and GPU. We propose a flexible benchmarking tool along with a series of tests to allow the evaluation of the per- formance of different CPU/GPU combinations in relation to a par- ticular implementation. We evaluate different uploading strategies and rendering methods for point-based compound glyphs suitable for representing the aforementioned datasets. CPU and GPU-based approaches are compared with respect to their rendering and stor- age efficiency to point out the optimal solution when dealing with time-dependent datasets. The results of our research are of general interest since they can be transferred to other applications where CPU-GPU bandwidth and a high number of graphical primitives per dataset pose a problem. The employed tool set for streamlining the measurement process is made publicly available. Index Terms: I.3.6 [Computer Graphics]: Methodology and Tech- niques I.3.6 [Computer Graphics]: Graphics data structures and data types I.3.7 [Computer Graphics]: Three-Dimensional Graph- ics and Realism 1 I NTRODUCTION The performance-optimized rendering of points or splats has been investigated for some time now. The applications of these tech- niques can be roughly divided into two main topics. The first relates to point set surface rendering, where the geometry of a single point is usually a very simple surface (circular or elliptic splats [3]). The rendering quality of such splats has been steadily improved over the years to yield high surface quality (see [2] and [14]). The other main area employs different kinds of glyphs with higher semantic density. This includes rendering of such glyphs on the GPU us- ing point billboards for particle datasets (e.g. see figure 1, [21], or [11]) and even more complex glyphs for information visualization purposes [5]. To obtain interactive performance, much time has been dedicated to develop efficient storage, like in-core representations and hierar- chical data structures (for example in [18] or [19], among many others). Linear memory layouts have been appreciated not only for their benefits for rendering performance, but also for the advantages when rendering out-of-core-data [13], which is why we employ this approach in our visualization system as well. However, in all the re- lated work we know of, the authors often make simplifying assump- tions regarding first-level data storage and transfer to the GPU. One assumption states that the visualized data is stored in the GPU mem- ory and read from static vertex buffer objects, which of course en- sures optimal performance. However, this is not possible when han- dling time-dependent data. The other assumption regards the best- performing upload techniques that need to be employed to cope with such dynamic data, which at first glance also seem an obvious choice. A software capable of handling such data has been shown in [8], however, there are many factors that can potentially influ- ence the resulting performance, starting from the hardware choice and system architecture, over driver issues to implementation is- sues (in the application as well as in drivers and hardware). To our knowledge, these well-informed choices are very rarely supported by hard numbers and direct comparison of the many alternatives, so we want to fill this gap. Examples for performance analyses ex- ist for the uploading and downloading of texture data as well as for shader arithmetic [7]. A more generic benchmarking tool exists [4], but it does not cover the aspects we are interested in, so we are trying to provide more detailed data on the available vertex upload mechanisms. Figure 1: A typical real-world particle dataset from the field of molecular dynamics, containing a mixture of ethanol and heptaflu- oropropane, 1,000,000 molecules altogether, represented by GPU- based compound glyphs. The main contributions of this paper are the provision of a bench- marking tool as well as the subsequent investigation of the perfor- mance impact of different vertex upload strategies and silhouette calculations for raycast GPU glyphs. We put these figures into con- text by applying the findings to a concrete visualization example from molecular dynamics simulations from the area of thermody- namics. Widespread programs for molecular visualization exist, for example VMD or Pymol, however their performance is not satisfac- tory when working with current datasets, which consist of several hundreds of thousands of molecules. One flaw is the lack of proper out-of-core rendering support for time-dependent datasets, and the other is insufficient optimization and visual quality, as the capabil- ities of current GPUs are not significantly harnessed. Approaches