Abstract— In this research, we have used neural tuning to quantify the neural representation of prosthetic arm’s actions in a new framework of BMI, which is based on Reinforcement Learning (RLBMI). We observed that through closed-loop brain control, the neural representation has changed to encode robot actions that maximize rewards. This is an interesting result because in our paradigm robot actions are directly controlled by a Computer Agent (CA) with reward states compatible with the user’s rewards. Through co-adaptation, neural modulation is used to establish the value of robot actions to achieve reward. I. INTRODUCTION rain machine interface (BMI) technologies provide an alternative means of communication and control that bypasses natural sensory and motor physiologic pathways. We have recently introduced a new framework (RLBMI) for BMIs based on reinforcement learning which continuously adapts with the user during brain control [1, 2]. In this paradigm, the user is rewarded for generating neural activations that produce behaviors, which lead to task completion. Specifically in our RLBMI, neural modulation is used to estimate a value function to help choose actions in a grid world that lead to maximizing reward returns. We are interested in quantifying the changes in M1 relative to the changes in prosthetic control in the users workspace environment [3]. Traditionally, changes in the functional organization and neural representation have been quantified using the formalism of directional tuning [4, 5]. Directional tuning has been used in the past to provide two quantification metrics for motor tasks. Tuning direction specifies what neurons are most correlated with (e.g. hand velocity). Tuning depth specifies how strong that correlation is. We seek to extend this theory to BMIs in a closed-loop brain control experiment where user and decoding algorithm are co- adapting through experience. We hypothesize that through skilled BMI use there will be a modification of the neural representation of the user of the BMI. The tuning will be This work was supported in part by the U.S. National Science Foundation under Grant #CNS-0540304, the Children’s Miracle Network, and the UF Alumni Association Fellowship. B. Mahmoudi and J. DiGiovanna are with the Department of Biomedical Engineering, University of Florida, 106 BME Building, Gainesville, FL 32611 USA (email: {babakm, jfd134}@ufl.edu) J. C. Principe is with the Department of Electrical and Computer and Biomedical Engineering, NEB 451, University of Florida, Gainesville, FL 32611 USA (e-mail: principe@cnel.ufl.edu) J. C. Sanchez is with the Department of Pediatrics, Division of Neurology, University of Florida, P.O. Box 100296, JHMHC, Gainesville, FL 32610 USA (e-mail: jcs77@ufl.edu) used as a metric to quantify the changes in the neural representation and how much has occurred. Quantifying neural representation in a BMI through directional tuning adds a new perspective compared to traditional behavioral motor learning experiments [6, 7]. In both situations, the user must learn how to control an appendage (natural or artificial). However, in a BMI the normal physiologic pathways are replaced by artificial actuators (e.g. robot, CA) which complete the task. Since the actuators are artificial, we have direct knowledge of all BMI control signals throughout learning. However, it remains unknown in the RLBMI how the normal neuronal activation is remapped to the decoding system variables and new environmental workspace. Since the decoding evolves through experience, we have the opportunity to find causal relationships. Specifically, we address the following questions in the RLBMI context: Does neural tuning direction change as the user learns to control the robot? Does neuronal tuning depth change throughout learning? II. METHODS A. RLBMI Experiment We briefly summarize the experimental paradigm, neural data acquisition and the computational framework of the RLBMI [1, 2]. In order to investigate the RLBMI, we designed an operant conditioning experiment paradigm where three male, Sprague Dawley rats were trained to control a robotic arm and maneuver this arm to press levers, in the robotic workspace (see Fig. 1 for overview). The RLBMI decodes robot actions from the user’s neuronal modulations. These modulations are recorded bilaterally with 16 microwire electrodes chronically implanted in each hemisphere of primary motor cortex (MI) for a total of 32 electrodes. Single neuron action potentials were detected and sorted using standard techniques [8] and 29 single units were discriminated for the rat in this study. Surgical details are given in [9] and signal acquisition and processing details in [1]. Modulations consist of the firing rate of each discriminated neuron which was estimated in non-overlapping 100 ms bins. We examine tuning in the brain-control portion of this experiment and a time line of this control is given in Fig. 2. Briefly, the rats were required to control the robotic arm (see Fig. 1) using only neuronal modulations form the single units we identified. A target side was randomly selected and according to the selected side the trial was labeled as left or right. Every 100 ms, the CA must select which robot control action to take given the Neuronal Tuning in a Brain-Machine Interface during Reinforcement Learning Babak Mahmoudi, Student Member, Jack DiGiovanna, Student Member, Jose C. Principe, Fellow, and Justin C. Sanchez, Member, IEEE B 30th Annual International IEEE EMBS Conference Vancouver, British Columbia, Canada, August 20-24, 2008 978-1-4244-1815-2/08/$25.00 ©2008 IEEE. 4491