874 A Cache-Aware Algorithm for PDEs on Hierarchical Data Structures Frank G¨ unther, Miriam Mehl, Markus P¨ ogl, and Christoph Zenger Institut f¨ ur Informatik, TU M¨ unchen Boltzmannstraße 3, 85748 Garching, Germany {guenthef,mehl,poegl,zenger}@in.tum.de Abstract. A big challenge in implementing up to date simulation software for various applications is to bring together highly efﬁcient mathematical methods on the one hand side and an efﬁcient usage of modern computer archtitectures on the other hand. We concentrate on the solution of PDEs and demonstrate how to overcome the hereby occuring quandary between cache-efﬁciency and mod- ern multilevel methods on adaptive grids. Our algorithm is based on stacks, the simplest possible and thus very cache-efﬁcient data structures. 1 Introduction In most implementations, competitive numerical algorithms for solving partial differen- tial equations cause a non-negligible overhead in data access and, thus, can not exploit the high performance of processors in a satisfying way. This is mainly caused by tree structures used to store hierarchical data needed for methods like multi-grid and adap- tive grid reﬁnement. We use space-ﬁlling curves as an ordering mechanism for our grid cells and – based on this order – to replace the tree structure by data structures which are processed lin- early. For this, we restrict to grids associated with space-trees (allowing local reﬁnemt) and (in a certain sense) local difference stencils. In fact, the only kind of data structures used in our implementation is a ﬁxed number of stacks. As stacks can be considered as the most simple data structures used in Computer Science allowing only the two basic operations push and pop 1 , data access becomes very fast – even faster than the common access of non-hierarchical data stored in matrices – and, in particular, cache misses are reduced considerably. Even the implementation of multi-grid cycles and/or higher order discretizations as well as the parallelization of the whole algorithm be- comes very easy and straightforward on these data structures and doesn’t worsen the cache efﬁciency. In literature, space-ﬁlling curves are a well-known device to construct efﬁcient grid partitionings for data parallel implementations of the numerical solution of partial dif- ferential equations [13–16, 19, 23, 24]. It is also known that – due to locality properties of the curves – reordering grid cells according to the numbering induced by a space- ﬁlling curve improves cache-efﬁciency (see e.g. [1]). Similar beneﬁts of reordering data along space-ﬁlling curves can also be observed for other applications like e.g. matrix 1 push puts data on top of a pile and pop takes data from the top of a pile. J. Dongarra, K. Madsen, and J. Wa´ sniewski (Eds.): PARA 2004, LNCS 3732, pp. 874–882, 2005. c  Springer-Verlag Berlin Heidelberg 2005