A High-Performance Sofware Graphics Pipeline Architecture
for the GPU
MICHAEL KENZEL, BERNHARD KERBL, and DIETER SCHMALSTIEG, Graz University of Technology, Austria
MARKUS STEINBERGER, Graz University of Technology, Austria and Max Planck Institute for Informatics, Germany
(a) (b) (c)
Fig. 1. Various scenes rendered by our sofware graphics pipeline in real-time on a GPU. (a) A smooth triangulation of the water surface in an animated ocean
scene is achieved via a custom pipeline extension that allows the mesh topology to dynamically adapt to the underlying heightfield. (b) Scene geometry
captured from video games like this still frame from Total War: Shogun 2 is used to evaluate the performance of our approach on real-world triangle distributions.
(c) Many techniques such as mipmapping rely on the ability to compute screen-space derivatives during fragment shading. Our pipeline architecture can
support derivative estimation based on pixel quad shading, used here to render a textured model of a heart with trilinear filtering; lower mipmap levels are
filled with a checkerboard patern to visualize the efect. Total War: Shogun 2 screenshot courtesy of The Creative Assembly; used with permission.
In this paper, we present a real-time graphics pipeline implemented entirely
in software on a modern GPU. As opposed to previous work, our approach
features a fully-concurrent, multi-stage, streaming design with dynamic
load balancing, capable of operating efciently within bounded memory. We
address issues such as primitive order, vertex reuse, and screen-space deriva-
tives of dependent variables, which are essential to real-world applications,
but have largely been ignored by comparable work in the past. The power of
a software approach lies in the ability to tailor the graphics pipeline to any
given application. In exploration of this potential, we design and implement
four novel pipeline modifcations. Evaluation of the performance of our
approach on more than 100 real-world scenes collected from video games
shows rendering speeds within one order of magnitude of the hardware
graphics pipeline as well as signifcant improvements over previous work,
not only in terms of capabilities and performance, but also robustness.
CCS Concepts: • Computing methodologies → Rasterization; Graph-
ics processors; Massively parallel algorithms;
Additional Key Words and Phrases: Software Rendering, GPU, Graphics
Pipeline, Rasterization, CUDA
ACM Reference Format:
Michael Kenzel, Bernhard Kerbl, Dieter Schmalstieg, and Markus Steinberger.
2018. A High-Performance Software Graphics Pipeline Architecture for
Authors’ addresses: Michael Kenzel, michael.kenzel@icg.tugraz.at; Bernhard Kerbl,
bernhard.kerbl@icg.tugraz.at; Dieter Schmalstieg, dieter.schmalstieg@icg.tugraz.at,
Graz University of Technology, Institute of Computer Graphics and Vision, Infeldgasse
16, Graz, 8010, Austria; Markus Steinberger, markus.steinberger@icg.tugraz.at, Graz
University of Technology, Institute of Computer Graphics and Vision, Infeldgasse 16,
Graz, 8010, Austria, Max Planck Institute for Informatics, Saarland Informatics Campus
Building E1 4, Saarbrücken, 66123, Germany.
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
This is the author’s version of the work. It is posted here for your personal use. Not for
redistribution. The defnitive Version of Record was published in ACM Transactions on
Graphics, https://doi.org/10.1145/3197517.3201374.
the GPU. ACM Trans. Graph. 37, 4, Article 140 (August 2018), 15 pages.
https://doi.org/10.1145/3197517.3201374
1 INTRODUCTION
For a long time now, the hardware graphics pipeline has been the
backbone of real-time rendering. However, while a hardware im-
plementation can achieve high performance and power efciency,
fexibility is sacrifced. Driven by the need to support an ever grow-
ing spectrum of ever more sophisticated applications, the graph-
ics processing unit (GPU) evolved as a tight compromise between
fexibility and performance. The graphics pipeline on a modern
GPU is implemented by special-purpose hardware on top of a large,
freely-programmable, massively-parallel processor. More and more
programmable stages have been added over the years. However,
the overall structure of the pipeline and the underlying rendering
algorithm have essentially remained unchanged for decades.
While evolution of the graphics pipeline proceeds slowly, GPU
compute power continues to increase exponentially. In addition
to the graphics pipeline, modern application programming inter-
faces (API) such as Vulkan [Khronos 2016b], OpenGL [Khronos
2016a], or Direct3D [Blythe 2006], as well as specialized interfaces
like CUDA [NVIDIA 2016] and OpenCL [Stone et al. 2010] also
allow the GPU to be operated in compute mode, which exposes the
programmable cores of the GPU as a massively-parallel general-
purpose co-processor. Although the hardware graphics pipeline
remains at the core of real-time rendering, cutting-edge graphics
applications increasingly rely on compute mode to implement ma-
jor parts of sophisticated graphics algorithms that would not easily
map to the traditional graphics pipeline such as, e.g., tiled deferred
rendering [Andersson 2009], geometry processing (cloth simula-
tion) [Vaisse 2014], or texel shading [Hillesland and Yang 2016].
ACM Trans. Graph., Vol. 37, No. 4, Article 140. Publication date: August 2018.