Central Force Optimization on a GPU: A Case Study in High Performance Metaheuristics using Multiple Topologies Robert C. Green II, Lingfeng Wang, and Mansoor Alam Department of Electrical Engineering and Computer Science The University of Toledo Toledo, OH, USA Robert.Green3@utoledo.edu; Lingfeng.Wang@utoledo.edu; Mansoor.Alam2@utoledo.edu Richard A. Formato Cataldo & Fisher, LLC 400 TradeCenter, Suite 5900 Woburn, MA 01801 rf2@ieee.org Abstract—Central Force Optimization (CFO) is a powerful new metaheuristic algorithm that has been demonstrated to be competitive with other metaheuristic algorithms such as Genetic Algorithms (GA), Particle Swarm Optimization (PSO), and Group Search Optimization (GSO). While CFO often shows superiority in terms of functional evaluations and solution quality, the algorithm is complex and often requires increased computational time. In order to decrease CFO’s computational time, we have implemented the concept of local neighborhoods and implemented CFO on a Graphics Processing Unit (GPU) using the NVIDIA Compute Unified Device Architecture (CUDA) extensions for C/C++. Pseudo- Random CFO (PR-CFO) is examined using four test problems ranging from 30 to 100 dimensions. Results are compared and analyzed across four unique implementations of the PR-CFO algorithm: Standard, Ring, CUDA, and CUDA-Ring. Decreases in computational time along with superiority in terms of solution quality are demonstrated. Keywords-central force optimization; graphics processing unit; CUDA; parallel computing; metaheuristics I. I NTRODUCTION Metaheuristic algorithms, specifically those that may be classified as population based intelligent search (PIS), have proven to be quite useful in solving a wide range of problems across many fields. Among the reasons that these algorithms have proven so successful when applied to difficult problems is their inclusion of intelligence and parallelism. Intelligence is typically achieved by mimicking some aspect of the natural world such as evolution, swarming, immune systems, and ant colonies while parallelism is achieved through the use of populations that concurrently search for and sample possible solutions. This can be seen in algorithms like evolutionary algorithms (EA), particle swarm optimization (PSO), artificial immune systems (AIS), and ant colony optimization (ACO), respectively. The intelligence that is added into these algorithms through this mimicry allows them to solve a wide variety of problems efficiently and accurately. A secondary method of implementing this parallelism is through the use of parallel and distributed computer plat- forms coupled with parallel programming languages such as MPI, OpenMP, and CUDA. This allows computationally intensive portions of the algorithms to be parallelized. The combination of these two types of parallelism - population based parallelism and data parallelism - allows the perfor- mance of these algorithms to be increased dramatically. A recent development in this field is an algorithm called Central Force Optimization (CFO). CFO is new optimization metaheuristic that has been proposed and developed in recent years [1]–[8]. CFO uses a population of probes that are spread across a search space. These probes are attracted based on the their gravitational pull towards each other based on their fitness. The algorithm differentiates itself as it includes no stochastic behavior. While this algorithm has been shown to be extremely promising in terms of solution quality and functional evaluations, the computational time required to solve optimization problems is often high when compared with other algorithms. As such, the intent of this work is the introduction of a new CFO variation called Ring CFO and the implementation and analysis of this and other variations of CFO on the GPU using CUDA. II. CENTRAL FORCE OPTIMIZATION The CFO algorithm is a novel and relatively new meta- heuristic that is based on the movement of probes through space. These probes are scattered throughout the search space and as time progresses they slowly move towards the probe that has achieved the highest mass or fitness. The algorithm is based on three main equations as detailed in Equations (1), (2), and (3). F = γ m 1 m 2 r 2 (1) a = γ m 2 ˆ r r 2 (2) R(t t)= R 0 + V 0 Δt + 1 2 aΔt 2 (3) 550 978-1-4244-7835-4/11/$26.00 ©2011 IEEE