Technical Section Parallel GPU-based data-dependent triangulations Michal ˇ Cerveˇ nansky ´ a, , Zsolt To ´th a , Juraj Starinsky ´ a , Andrej Ferko a , Milo ˇ s ˇ Sra ´mek b a Faculty of Mathematics, Physics and Informatics, Comenius University, Slovakia b Austrian Academy of Sciences, Vienna,Austria a r t i c l e i n f o Article history: Received 22 April 2009 Received in revised form 30 December 2009 Accepted 8 January 2010 Keywords: Data-dependent triangulation Image reconstruction Graphics hardware GPGPU a b s t r a c t In this paper we introduce a new technique for data-dependent triangulation which is suitable for implementation on a GPU. Our solution is based on a new parallel version of the well known Lawson’s optimization process and is fully compatible with restrictions of the GPU hardware. We test and compare the quality of our solution in an image reconstruction problem. In comparison with the standard implementations we achieve significant speed-up (eight times on average) with comparable quality of the reconstructed image. Further, several other improvements and optimizations are introduced and tested, and the results are discussed in detail. & 2010 Elsevier Ltd. All rights reserved. 1. Introduction Parallelization of various computationally expensive problems is today made possible by the architecture of mainstream processors. Simultaneously, generalization of the GPU architectures to non-graphic applications results in their wide usage in non- standard areas,which benefit from the inherent parallelism of GPUs. This observation leads us to the idea of accelerating computation of optimal triangulations by such a GPU implementa- tion. Nowadays, generation of optimal triangulationsis done mainly on the CPU,and, depending on the type of the required property, the optimality can be achieved in a very time-consuming optimization process. Optimal triangulations are widely used in different branches of science and technology. We focus our attention on the generation of locally optimal meshes by iterative improvement (optimization), which can be used in various geometrically defined problems, as, for example, in finite element simulations and image reconstruction. Specifically,we selected the image reconstruction (namely, edge preserving magnification) problem, which can be solved by a special case of optimal mesh, called data-dependent triangulation (DDT) [8]. The main advantage of this technique resides in its ability to fit the mesh structure to the underlying data. We do not generate the triangulation,but only optimize it. As an input we require an arbitrary triangular mesh and with the help of special cost functions and topological operations we generate an optimal one from it. With different choices of these functions we can obtain different proper- ties of the resulting mesh. Image reconstruction is only one ofthe application areas of data-dependent triangulations. The possibility of visual represen- tation of results was the reason for our selection. In comparison with the convolution based techniques, the DDT based approaches produce visually more pleasant results in high- frequency areas.As an example,see the blocky artifacts in the edge areas in Fig.1(a), obtained by convolution, compared with results obtained by non-convolution approaches in Fig. 1(b) and (c). For creation of data-dependentmeshes we choose an approach called Lawson’s optimization process. Effective imple- mentation of this technique on a GPU requires its parallelization. The processing pipeline of a graphics card, however, is not directly suitable for representation and maintenance of the data structures usually used in this type of computations. Therefore, the main challenge was to avoid the hardware restrictions and to design a fully parallel approach implementable on a GPU. The main contribution of this paper is the solution of the above-mentioned problem.We introduce a parallel version of Lawson’s optimization process which is completely implemen- table on a GPU. We further present several optimizations and improvements of the basic parallel version. In Fig. 1(c) we can see that our approach gives visually similar results to those ofthe original CPU-based approach Fig. 1(b), but its run time on a GPU is about eight times shorter. While the presented results are oriented to the image reconstruction problem, the technique is nearly general and can be applied to any DDT related task and to arbitrary data distribution. This paper is structured as follows. In Section 2 we briefly survey previous work on data-dependent triangulations and on ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cag Computers & Graphics 0097-8493/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cag.2010.01.001 Corresponding author. E-mail address: cervenansky@sccg.sk (M. ˇ Cerveˇ nansky ´ ). Computers & Graphics 34 (2010) 125–135