874 A Cache-Aware Algorithm for PDEs on Hierarchical Data Structures Frank G¨ unther, Miriam Mehl, Markus P¨ ogl, and Christoph Zenger Institut f¨ ur Informatik, TU M¨ unchen Boltzmannstraße 3, 85748 Garching, Germany {guenthef,mehl,poegl,zenger}@in.tum.de Abstract. A big challenge in implementing up to date simulation software for various applications is to bring together highly efficient mathematical methods on the one hand side and an efficient usage of modern computer archtitectures on the other hand. We concentrate on the solution of PDEs and demonstrate how to overcome the hereby occuring quandary between cache-efficiency and mod- ern multilevel methods on adaptive grids. Our algorithm is based on stacks, the simplest possible and thus very cache-efficient data structures. 1 Introduction In most implementations, competitive numerical algorithms for solving partial differen- tial equations cause a non-negligible overhead in data access and, thus, can not exploit the high performance of processors in a satisfying way. This is mainly caused by tree structures used to store hierarchical data needed for methods like multi-grid and adap- tive grid refinement. We use space-filling curves as an ordering mechanism for our grid cells and – based on this order – to replace the tree structure by data structures which are processed lin- early. For this, we restrict to grids associated with space-trees (allowing local refinemt) and (in a certain sense) local difference stencils. In fact, the only kind of data structures used in our implementation is a fixed number of stacks. As stacks can be considered as the most simple data structures used in Computer Science allowing only the two basic operations push and pop 1 , data access becomes very fast – even faster than the common access of non-hierarchical data stored in matrices – and, in particular, cache misses are reduced considerably. Even the implementation of multi-grid cycles and/or higher order discretizations as well as the parallelization of the whole algorithm be- comes very easy and straightforward on these data structures and doesn’t worsen the cache efficiency. In literature, space-filling curves are a well-known device to construct efficient grid partitionings for data parallel implementations of the numerical solution of partial dif- ferential equations [13–16, 19, 23, 24]. It is also known that – due to locality properties of the curves – reordering grid cells according to the numbering induced by a space- filling curve improves cache-efficiency (see e.g. [1]). Similar benefits of reordering data along space-filling curves can also be observed for other applications like e.g. matrix 1 push puts data on top of a pile and pop takes data from the top of a pile. J. Dongarra, K. Madsen, and J. Wa´ sniewski (Eds.): PARA 2004, LNCS 3732, pp. 874–882, 2005. c Springer-Verlag Berlin Heidelberg 2005