Fast Two Dimensional Convex Hull on the GPU
Srikanth Srungarapu Durga Prasad Reddy
Kishore Kothapalli P. J. Narayanan
International Institute of Information Technology, Hyderabad
Gachibowli, Hyderabad, India – 500 032.
Email:{srikanth_s@students., durgaprasad_b@students.} iiit.ac.in
{kkishore@, pjn@} iiit.ac.in
Abstract—General purpose programming on the graphics
processing units(GPGPU) has received a lot of attention in
the parallel computing community as it promises to offer a
large computational power at a very low price. GPGPU is
best suited for regular data parallel algorithms. They are not
directly amenable for algorithms which have irregular data access
patterns such as convex hull, list ranking etc. In this paper,
we present a GPU-optimized implementation for finding the
convex hull of a two dimensional point set. Our implementation
tries to minimize the impact of irregular data access patterns.
Our implementation can find the convex hull of 10 million
random points in less than 0.2 seconds and achieves a speedup
of up to 14 over the standard sequential CPU implementation.
We also discuss some of the practical issues relating to the
implementation of convex hull algorithms on massively multi-
threaded architectures like that of the GPU.
I. I NTRODUCTION
The advent of General Purpose Computing on the GPU
(GPGPU), has placed GPUs as a viable general purpose co-
processor. The GPU architecture fits the data parallel com-
puting model best, with a single processing kernel applied
to a large data grid. The cores of the GPU execute in a
Single Instruction, Multiple Data (SIMD) mode at the lowest
level. Many data parallel algorithms have been developed
on the GPU in the recent past [4], including FFT [15] and
other scientific applications [16]. Primitives that are useful
in building larger data parallel applications have also been
developed on the GPUs. These include parallel prefix sum
(scan) [19], reduction, and sorting [27]. Regular memory
access and high arithmetic intensity are key to extracting
peak performance on the GPUs. However, there are several
important classes of applications which have either a low
arithmetic intensity, or irregular data access patterns, or both.
Recent efforts are directed towards arriving at efficient imple-
mentations of irregular applications such as list ranking [26]
and graph algorithms [22]. Finding the convex hull of a set
of points is another such typical problem that has irregular
memory access patterns and sequential dependencies.
The convex hull of a set Q of points is the smallest convex
polygon P for which each point in Q is either on the boundary
of P or in its interior. The portion of the convex hull which
is below (above) the line joining the leftmost points and
rightmost points is called lowerhull (upperhull). Convex hull
[7] is one of the fundamental structures in computational
geometry. One of the reasons that make convex hull of a point
set an important geometric structure is that it is one of the
simplest shape approximations for a given set of points. Other
problems in computational geometry like Delaunay triangu-
lation, Voronoi diagrams, halfsapce intersection, etc. can be
reduced to the convex hull. The problem of finding the convex
hull also finds its practical applications in pattern recognition,
operations research, design automation: references [12], [13],
[28] just to cite a few discuss some interesting applications in
these areas. Given the importance of the problem, it is essential
that a fast and scalable implementation for the convex hull
on modern architectures such as the GPU is available. Such
an implementation has the scope to enable high performance
implementations for other computational geometry problems
such as those mentioned earlier.
Our implementation for the convex hull on the GPU
achieves a speedup of up to 14 over a standard sequential
CPU implementation and is highly scalable. For instance, we
can find the convex hull of a 10 M sized two-dimensional
data set in about 0.2 seconds. Our work can thus lead to
efficient implementations of other important algorithms in
computational geometry on GPUs.
A. Related Work
There have been several parallel algorithms for the convex
hull problem. In the fine grained parallel setting, algorithms
have been described for many PRAM models including the
CRCW PRAM [1], the CREW PRAM [5] models. How-
ever, it should be noted that the PRAM model is a purely
algorithmic model and ignores several factors such as the
memory hierarchy, communication latency, and scheduling,
among others. Hence, PRAM algorithms may not immediately
fit novel architectures such as the GPU.
Some of the popular parallel PRAM algorithms for convex
hull are [20], [3], [25], [1], [5]. Of these, the quick hull
algorithm is similar to the divide and conquer algorithm [25],
[20]. However, the sub-problems formulated by quick hull
are independent because no further merging of solutions is
required. Hence, we have used this algorithm for developing
an efficient parallel implementation on GPU.
M.Diallo [10] discusses a scalable parallel algorithm for
building the convex hull on coarse grained multicomputers [9]
which require time O(n log n/p + T
s
(n, p)), where T
s
(n, p)
refers to the time of a global sort of n data on a p processor
machine. In [6], the authors presents a parallel algorithm for
computing the convex hull, realized using the Bulk Synchronus
2011 Workshops of International Conference on Advanced Information Networking and Applications
978-0-7695-4338-3/11 $26.00 © 2011 IEEE
DOI 10.1109/WAINA.2011.64
7