Optimizing D-GM quantum computing by exploring parallel and distributed quantum simulations under GPUs arquitecture Anderson Avila, Renata H. S. Reiser, Mauricio L. Pilla and Adenauer C. Yamin Centre for Tecnological Development Federal University of Pelotas Pelotas, 1 Gomes Carneiro St. Email: {avila, reiser, pilla, adenauer}@inf.ufpel.edu.br Abstract—The exponential increase in the temporal and spatial complexities is one of the main challenges in the widespread use of quantum algorithm simulation, especially in dense quantum transformations (QTs) such as the Hadamard transformation (H), which has found wide applications in computer and communica- tion science and also comprising the simplest quantum universal set of QTs. The main reason for these costs is the expansion of QTs by using tensor product in multi-dimension quantum applications. In this work, new optimizations for the execution of reduction and decomposition based on the Identity operator are introduced in the Distributed Geometric Machine framework (D-GM). Instead of executing the quantum transformation in a single step, they are divided in sub-quantum transformations and only the values different from Identity transformations are stored. Mixed Partial Processes provide control over the increase in the size of read/write memory states in the calculation of a QT, thus contributing to increase the scalability of applications regarding hardware-GPUs memory limit. In the evaluation of this D-GM extension, Hadamard Transformations were simulated up to 28 qubits applications over a single GPU. Our new simulator is 10, 829× faster and allows for the simulation of more qubits when compared to our previous implementation running on the same GPU. Index Terms—quantum computing, quantum simulation; Hadamard Gate; D-GM I. I NTRODUCTION Quantum Computing (QC) is a new paradigm with the po- tential of exploring quantum mechanics to provide unsurpassed parallelism. However, quantum computers are still in their early days and cannot provide more than a few quantum bits, or qubits, in a reliable way. Until quantum computers become widely available, the development and testing of new algorithms for these systems may be done by analytical processes or by simulation. Al- though iterative simulation provides many advantages over analytical processes, such as ease of use and correctness, the simulation of quantum computing over classical computers is a demanding task both in terms of temporal and spatial complexity. The two basic building structures of quantum computers are the qubit and QT. As the former is represented as matrices and each QT is an operation between two matrices, result matrices have their sizes greatly increased with the number of qubits. Simulators have been exploring the potential of massively parallel architectures such as found in GPUs [1]. Naively executing the operations as matrix multiplications already provides high speedups when compared to the use of general- purpose CPUs, but the memory required is an issue for further scalability. Many applications of quantum concepts in Computational Intelligence dealing with robot sensing systems and robotics automation have been developed in recent years [2]–[4]. In these applications, interactions with various aspects of Computational Intelligence such as Neural Nets, Bayesian networks, Logic Networks, Fuzzy Logic, state machines, etc., can be extended to those based on quantum circuits. Following such new paradigms, D-GM framework have been employed in quantum computing simulations in the areas of quantum computing, quantum fuzzy logic, quantum fuzzy computing systems, computational intelligence [5]–[8]. In this work, the representation of quantum transformations are improved by the clever use of the Identity Operator (ID operator) and by the splitting of operations that are not depen- dent. Although the number of steps required for simulation is increased, simulation time is reduced – the execution of the sub-steps, combined, is faster than execution of the complete operation. Besides, these sub-steps may be executed in parallel and distributed among GPUs. These two approaches to improve quantum simulation per- formance were implemented in the Distributed Geometric Machine (D-GM) framework and experimentally evaluated with a Hadamard gate simulation from 21 to 28 qubits, with different sub-step sizes. The best relative speedup of 10, 829 was achieved with 22 qubits, hence showing a great potential for accelerating quantum computing simulations for more complex setups. Hadamard transformation has found wide applications in computer and communication science, since it can be gen- eralized to the higher dimensions as the Quantum Fourier Transformation and mainly related to Shor’s fast algorithm for factoring and discrete logarithm [9], [10]. Additionally, it is indeed significant in order to analyse the quantum algorithms and generate superposition of the amplitudes up to a sign or a