Fast Two Dimensional Convex Hull on the GPU Srikanth Srungarapu Durga Prasad Reddy Kishore Kothapalli P. J. Narayanan International Institute of Information Technology, Hyderabad Gachibowli, Hyderabad, India – 500 032. Email:{srikanth_s@students., durgaprasad_b@students.} iiit.ac.in {kkishore@, pjn@} iiit.ac.in Abstract—General purpose programming on the graphics processing units(GPGPU) has received a lot of attention in the parallel computing community as it promises to offer a large computational power at a very low price. GPGPU is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as convex hull, list ranking etc. In this paper, we present a GPU-optimized implementation for finding the convex hull of a two dimensional point set. Our implementation tries to minimize the impact of irregular data access patterns. Our implementation can find the convex hull of 10 million random points in less than 0.2 seconds and achieves a speedup of up to 14 over the standard sequential CPU implementation. We also discuss some of the practical issues relating to the implementation of convex hull algorithms on massively multi- threaded architectures like that of the GPU. I. I NTRODUCTION The advent of General Purpose Computing on the GPU (GPGPU), has placed GPUs as a viable general purpose co- processor. The GPU architecture fits the data parallel com- puting model best, with a single processing kernel applied to a large data grid. The cores of the GPU execute in a Single Instruction, Multiple Data (SIMD) mode at the lowest level. Many data parallel algorithms have been developed on the GPU in the recent past [4], including FFT [15] and other scientific applications [16]. Primitives that are useful in building larger data parallel applications have also been developed on the GPUs. These include parallel prefix sum (scan) [19], reduction, and sorting [27]. Regular memory access and high arithmetic intensity are key to extracting peak performance on the GPUs. However, there are several important classes of applications which have either a low arithmetic intensity, or irregular data access patterns, or both. Recent efforts are directed towards arriving at efficient imple- mentations of irregular applications such as list ranking [26] and graph algorithms [22]. Finding the convex hull of a set of points is another such typical problem that has irregular memory access patterns and sequential dependencies. The convex hull of a set Q of points is the smallest convex polygon P for which each point in Q is either on the boundary of P or in its interior. The portion of the convex hull which is below (above) the line joining the leftmost points and rightmost points is called lowerhull (upperhull). Convex hull [7] is one of the fundamental structures in computational geometry. One of the reasons that make convex hull of a point set an important geometric structure is that it is one of the simplest shape approximations for a given set of points. Other problems in computational geometry like Delaunay triangu- lation, Voronoi diagrams, halfsapce intersection, etc. can be reduced to the convex hull. The problem of finding the convex hull also finds its practical applications in pattern recognition, operations research, design automation: references [12], [13], [28] just to cite a few discuss some interesting applications in these areas. Given the importance of the problem, it is essential that a fast and scalable implementation for the convex hull on modern architectures such as the GPU is available. Such an implementation has the scope to enable high performance implementations for other computational geometry problems such as those mentioned earlier. Our implementation for the convex hull on the GPU achieves a speedup of up to 14 over a standard sequential CPU implementation and is highly scalable. For instance, we can find the convex hull of a 10 M sized two-dimensional data set in about 0.2 seconds. Our work can thus lead to efficient implementations of other important algorithms in computational geometry on GPUs. A. Related Work There have been several parallel algorithms for the convex hull problem. In the fine grained parallel setting, algorithms have been described for many PRAM models including the CRCW PRAM [1], the CREW PRAM [5] models. How- ever, it should be noted that the PRAM model is a purely algorithmic model and ignores several factors such as the memory hierarchy, communication latency, and scheduling, among others. Hence, PRAM algorithms may not immediately fit novel architectures such as the GPU. Some of the popular parallel PRAM algorithms for convex hull are [20], [3], [25], [1], [5]. Of these, the quick hull algorithm is similar to the divide and conquer algorithm [25], [20]. However, the sub-problems formulated by quick hull are independent because no further merging of solutions is required. Hence, we have used this algorithm for developing an efficient parallel implementation on GPU. M.Diallo [10] discusses a scalable parallel algorithm for building the convex hull on coarse grained multicomputers [9] which require time O(n log n/p + T s (n, p)), where T s (n, p) refers to the time of a global sort of n data on a p processor machine. In [6], the authors presents a parallel algorithm for computing the convex hull, realized using the Bulk Synchronus 2011 Workshops of International Conference on Advanced Information Networking and Applications 978-0-7695-4338-3/11 $26.00 © 2011 IEEE DOI 10.1109/WAINA.2011.64 7