International Journal of Computer Applications (0975 – 8887) Volume 95– No. 15, June 2014 1 Parallel Implementations for Solving Shortest Path Problem using Bellman-Ford Gaurav Hajela Department of Computer Science and Engineering Maulana Azad National Institute of Technology Bhopal, India Manish Pandey Department of Computer Science and Engineering Maulana Azad National Institute of Technology Bhopal, India ABSTRACT In this paper, different parallel implementations of Bellman- Ford algorithm on GPU using OpenCL are presented. These variants include Bellman-Ford for solving single source shortest path (SSSP) having two variants and Bellman-Ford for all pair shortest path (APSP) problems. Also, a comparative analysis of their performances on CPU and GPU is discussed in this paper.Write-write consistency in Bellman- Ford is overcome using synchronization mechanism available in OpenCL and by explicit synchronization by modifying the algorithm.An average speed up of 13.8x for parallel bellman ford for SSSP and an average speed up of 18.5x for bellman ford for APSP is achieved by proposed algorithm. Keywords Shortest path problem , OpenCL , Graphical processing unit(GPU). 1. INTRODUCTION Single source shortest path problem finds application in large domains of scientific and real world. Common applications of these algorithms are in network routing [6], VLSI design, robotics and transportation, they are also used for directions between physical locations like in google maps. Here all the applications mentioned generally involve positive weights but some applications are there where weights can be negative like currency exchange arbitrage and some other areas where, edge represents something other than merely distance between two entities. In such application areas Bellman-Ford algorithm can be used. Bellman-Ford algorithm[12] is applicable on graphs with negative weights and can also detect negative cycles where majority of algorithms fail. Bellman-Ford is also used in wireless sensor networks and other ad hoc networks as distributed Bellman Ford [7] can be used there. Distributed Bellman-Ford is also used as first ARPANET routing algorithm in 1969 [14]. Most of the above application areas specified are real time applications and need results in a quick time so the performance of algorithm need to be improved so that it consume less power and time. Parallel computing on GPU is one of the technologies which is used for high performance computing at a reasonable cost and considerable speed up of performance. GPU is currently used for a variety of purposes apart from graphical processing and gaming. That’s why GPU is referred as General Purpose Graphical programming unit (GPGPU)[10] as it provides high performance computing can be programmed using standard frame work like OpenCL and CUDA. OpenCL [11] is a framework which is for all GPU while, CUDA is meant specifically for NVIDIA GPUs only. Thus, OpenCL is used for GPU implementation due to its portability and open-ness. 1.1 Bellman Ford Algorithm Consider a graph G(n,E,V) where, n is the number of vertices, E is the set of edges and V is the set of vertices. Adjacency matrix representation of graph is used here, as it is well suited for GPU. Here, Cost is the adjacency matrix for graph. Initially, Dist will contain direct edges from the source ‘s’. Afterwards, Dist[v] of ‘k th ’ iteration means distance from ‘s’ to ‘v’ going through no more than ‘k’ intermediate edges. Finally, after successful completion of algorithm Dist will contain the shortest path to all the vertices ‘v’ in V from source ‘s’. For each edge(u,v) in set E, Relax(u,v) is called (n-1) times. So, Relax() is called E(n-1) times, thus majority of time of the algorithm is spent in this procedure. The algorithm for Bellman Ford is illustrated in Algorithm 1. Algorithm BellmanFord (s,Dist,Cost,n) { 1. for i=1 to n do 2. Dist[i] = Cost[s,i]; 3. End for 4. for k=1 to n-1 do 5. for each (u,v) in E do 6. Relax(u,v) 7. End for 8. End for } Relax (u,v) { 1. if Dist[v]> Dist[u] + Cost[u,v] 2. Dist[v] = Dist[u] + Cost[u,v] } Algorithm 1: Algorithm for Bellman-Ford. Time complexity of above algorithm if adjacency matrix representation is used will be O(n 3 ) . All pair shortest path using bellman ford algorithm could also be calculated if above algorithm for all the vertices in the graph is called. For each s in V Call BellmanFord(s,Dist,Cost,n); End for Organization of the paper: In Section 2, the previous modified algorithms have been discussed along with the improvements made on Bellman Ford algorithm by different authors. In Section 3, identified parallelism in standard Bellman Ford algorithm and other write-write conflict issues in parallelization of the algorithm are presented. In Section 4, proposed parallel algorithm along with OpenCL kernel is