Optimization techniques for parallel irregular reductions q E. Guti errez * , O. Plata, E.L. Zapata Department of Computer Architecture, University of Malaga, P.O. Box 4114, E-29080 Malaga, Spain Abstract Different parallelization techniques have been proposed in the literature for irregular reductions in the context of shared memory multiprocessors. They may be classified into two broad families: those based on privatization of the reduction arrays and those based on the partitioning of the reduction arrays. Methods in the first family are simple but no data locality is exploited and their memory scalability is low. On the other hand, methods in the second family are more complex as they require an inspection phase but they exploit data locality and scale up better in memory. Fo- cusing on partitioning-based methods, although they exhibit a good performance in a wide variety of irregular codes, some specific input data patterns may exist for which the performance is lowered. In particular these kind of access patterns may reduce the exploited parallelism by the method or introduce workload unbalances. In order to mitigate these negative effects, we propose three optimizations for a specific partitioning-based method (DWA–LIP). These optimizations try to increase the exploited parallelism, balance the workload and reduce the effect of high contention degree regions in the reduction arrays. Efficient implementations of the proposed optimizations for the DWA–LIP method have been tested experimentally, and compared with other methods for parallelizing irregular reductions. Ó 2003 Elsevier B.V. All rights reserved. Keywords: Irregular reductions; ccNUMA shared memory multiprocessors; Data locality; Privatization; Partitioning; Load balance 1. Introduction Many scientific and engineering applications are classified as irregular. This class of applications are characterized by the use of indirections to ac- cess data in memory. As a consequence memory access patterns are unknown during compile time. It is very common to find in these applications reduction operations associated with the irregular memory accesses. A reduction operation is defined from an associative and commutative operator acting on scalar variables (scalar reduction) or array elements inside a loop (histogram reduction). Fig. 1 shows a prototype of a histogram re- duction loop, containing several irregular reduc- tions. In that loop one reduction array AðÞ is updated through nInd indirection arrays, f 1 ðÞ; f 2 ðÞ; ... ; f nInd ðÞ. Due to the loop-variant nature of the subscript arrays, loop-carried de- pendences may be present. However, these possi- ble dependences can be solved due to the associative and commutative nature of the reduc- tion operator. The properties of the memory access pattern in the histogram loop are completely defined by the q This work was supported by the Ministry of Education and Culture (CICYT), Spain, through grant TIC2000-1658. * Corresponding author. Tel.: +34-952-13-2821; fax: +34- 952-13-2790. E-mail addresses: eladio@ac.uma.es (E. Guti errez), os- car@ac.uma.es (O. Plata), ezapata@ac.uma.es (E.L. Zapata). 1383-7621/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved. doi:10.1016/S1383-7621(03)00057-2 Journal of Systems Architecture 49 (2003) 63–74 www.elsevier.com/locate/sysarc