A Novel Approach for Detecting Heap-based Loop-carried Dependences * A. Tineo, F. Corbera, A. Navarro, R. Asenjo, and E.L. Zapata Dpt. of Computer Architecture, University of M´ alaga, Complejo Tecnologico, Campus de Teatinos, E-29071. M´ alaga, Spain. {tineo,corbera,angeles,asenjo,ezapata}@ac.uma.es Abstract The problem of data dependences in pointer-based codes is crucial to various compiler optimizations. The approach presented in this paper focus on detecting data dependences induced by heap-directed pointers on loops that access dy- namic data structures. Knowledge about the shape of the data structure accessible from a heap-directed pointer, pro- vides critical information for disambiguating heap accesses originating from it. Our approach is based on a previ- ously developed shape analysis that maintains topological information of the connections among the different nodes (memory locations) in the data structure. As a novelty, our approach carries out abstract interpretation of the state- ments being analyzed, annotating memory locations with read/write information. This information will be later used in a very accurate dependence test which we describe in this paper. We also discuss its application to three different pro- grams: the sparse matrix-vector product, mst from Olden and twolf from the SPEC CPU2000 suite. 1 Introduction Optimizing and parallelizing compilers rely upon accu- rate static disambiguation of memory references, i.e. de- termining at compile time if two given memory references always access disjoint memory locations. Unfortunately the presence of alias in pointer-based codes makes mem- ory disambiguation a non-trivial issue. An alias arises in a program when there are two or more distinct ways to re- fer to the same memory location. The problem of calcu- lating pointer-induced aliases, called pointer analysis, has received significant attention over the past few years [11], [3]. Pointer analysis can be divided into two distinct sub- problems: stack-directed analysis and heap-directed analy- sis. We focus our research in the latter, which deals with objects dynamically allocated in the heap. An important * This work was supported in part by the Ministry of Education of Spain under contract TIC2003-06623. body of work has been conducted lately on this kind of analysis. A promising approach to deal with dynamically allocated structures consists in explicitly abstracting the dy- namic store in the form of a bounded graph. In other words, the heap is represented as a storage shape graph and the analysis tries to capture some shape properties of the heap data structures. This type of analysis is called shape analy- sis and in this context, our research group has developed a powerful shape analysis framework [2]. The approach presented in this paper focus on detect- ing data dependences induced by heap-directed pointers on loops that access pointer-based dynamic data structures. Particularly, we are interested in the detection of the loop- carried dependences (henceforth referred as LCDs) that may arise between the statements in two iterations of the loop. Knowledge about the shape of the data structure ac- cessible from heap-directed pointers, provides critical infor- mation for disambiguating heap accesses originating from them, in different iterations of a loop, and hence to provide that there are not data dependences between iterations. Until now, the majority of LCDs detection techniques based on shape analysis [3], [6], use as shape information a coarse characterization of the data structure being traversed (Tree, DAG, Cycle). One advantage of this type of analysis is that it enables faster data flow merge operations and re- duces the storage requirements for the analysis. However, it also causes a loss of accuracy in the detection of the data dependences, specially when the data structure being visited is not a “clean” tree, contain cycles or is modified along the traverse. Our approach, on the contrary, is based on a shape anal- ysis that maintains topological information of the connec- tions among the different nodes (memory locations) in the data structure. In fact, our representation of the data struc- ture provides us a more accurate description of the mem- ory locations reached when a statement is executed inside a loop. Moreover, as we will see in the next sections, our shape analysis is based on the abstract interpretation of the program statements over the graphs that represent the data structure at each program point. In other words, our ap-