IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 Abstract— The observation of interactions between neurons of a network can reveal important information about how information is processed within that network. Such observation can be established with the analysis of causality between the activities of the different neurons in the network. This analysis is called effective connectivity analysis. However, methods for such analysis are either computationally heavy for daily use or too inaccurate for making reliable analyses. Cox method produces reliable analysis, but the computation takes hours on CPUs, making it slow to use on research. In this paper, two algorithms are presented that speed up analysis of Cox method by parallelizing the computation on a Graphical Processing Unit (GPU) with the help of Compute Unified Device Architecture (CUDA) platform. Both algorithms are evaluated according to network size and recording duration. The interest of proposing GPU implementations is in gaining computation time but another important interest is that such implementation requires rethinking the algorithm in different ways as the sequential implementation. This rethinking itself brings new optimization possibilities, e.g. by employing OpenCL. Utilizing this accelerated implementation, the Cox method is then applied on an experimental dataset from CRCNS in a personal computer. This should facilitate observations of biological neural network organizations that can provide new insights to improve understanding of memory, learning and intelligence 1 . Index Terms—Parallel algorithms, Parallel processing, Maximum likelihood estimation, Biological neural networks, Complex networks, Topology analysis I. INTRODUCTION The observation of biological neural networks organizations plays a significant role in the development of innovative 1 Copyright (c) 2016 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org topologies of Spiking Neural Networks (SNNs) [1], that are computationally more powerful than Artificial Neural Networks [2]. The non-linear dynamicity of neuronal networks within the neuroanatomical substrate (structural connectivity) of complex nervous systems, e.g. brain, produces patterns of statistical dependencies as well as causal interactions [3]. The former, i.e. functional connectivity, captures the statistical dependencies among spatially remote neurophysiological events and the latter, i.e. effective connectivity, explains the causal effects of a neural system over another [4]. The functional analysis of neural connections [5], [6] and connection changes [7] takes an important part in this observation for two reasons. First, this analysis enables the observation of causal relations between input stimuli and the activation of paths in a neural network, thus uncovering response patterns to specific events. Second, this analysis enables the creation of a strong relation between the structure of a network and its functionality. Based on the hypothesis of a strong correlation between network’s function and its structure, the analysis of temporal connectivity between neurons can be used to reproduce networks exhibiting complex functionalities such as face recognition, natural language processing or complex tasks requiring deep machine learning [8]–[11]. However, accurate methods for effective connectivity analysis, such as methods based on Maximum Likelihood (ML) estimation, are computationally expensive [12], [13]. Running such analysis on a personal computer typically requires hours of computing. This slows down the development of novel ideas inspired from such analysis of the complex behavior of biological neural networks. Compromising the accuracy of effective connectivity analysis with simpler methods for faster computation can lead to important misconceptions. For example, the Cross-Correlation Functional analysis (CCF) [14], a computationally simpler method than ML estimates, cannot recognize direct and indirect paths between two nodes of a network [15], whereas ML estimation based Cox method can [16]. Graphical Processing Units (GPUs) are not only capable of massive parallelization and crunching but also have very high energy efficiency compared to CPUs. The Compute Unified Device Architecture (CUDA) gives the possibility to execute parallel programs on a personal computer or a laptop GPUs. This provides High Performance Computing to a wider Effective connectivity analysis in brain networks: a GPU-Accelerated Implementation of the Cox Method Vafa Andalibi, Francois Christophe, Teemu Laukkarinen, Tommi Mikkonen Paper submitted for review on October 31 th 2015. “This work is supported by the Academy of Finland under Project: Bio-Integrated Software development for Adaptive Sensor Networks, project number 278882”. V. Andalibi is with Department of Electronics and Communications and Department of Pervasive Computing, Tampere University of Technology, PO Box 553, FI-33101 Tampere, Finland (e-mail: vafa.andalibi@tut.fi). F. Christophe is with Department of Pervasive Computing, Tampere University of Technology, PO Box 553, FI-33101 Tampere, Finland (e-mail: francois.christophe@tut.fi). T. Laukkarinen is with Department of Pervasive Computing, Tampere University of Technology, PO Box 553, FI-33101 Tampere, Finland (e-mail: teemu.laukkarinen@tut.fi). T. Mikkonen is with Department of Pervasive Computing, Tampere University of Technology, PO Box 553, FI-33101 Tampere, Finland (e-mail: tommi.mikkonen@tut.fi).