Abstract— In this research, we have used neural tuning to
quantify the neural representation of prosthetic arm’s actions
in a new framework of BMI, which is based on Reinforcement
Learning (RLBMI). We observed that through closed-loop
brain control, the neural representation has changed to encode
robot actions that maximize rewards. This is an interesting
result because in our paradigm robot actions are directly
controlled by a Computer Agent (CA) with reward states
compatible with the user’s rewards. Through co-adaptation,
neural modulation is used to establish the value of robot actions
to achieve reward.
I. INTRODUCTION
rain machine interface (BMI) technologies provide an
alternative means of communication and control that
bypasses natural sensory and motor physiologic pathways.
We have recently introduced a new framework (RLBMI) for
BMIs based on reinforcement learning which continuously
adapts with the user during brain control [1, 2]. In this
paradigm, the user is rewarded for generating neural
activations that produce behaviors, which lead to task
completion. Specifically in our RLBMI, neural modulation
is used to estimate a value function to help choose actions in
a grid world that lead to maximizing reward returns. We are
interested in quantifying the changes in M1 relative to the
changes in prosthetic control in the users workspace
environment [3].
Traditionally, changes in the functional organization and
neural representation have been quantified using the
formalism of directional tuning [4, 5]. Directional tuning has
been used in the past to provide two quantification metrics
for motor tasks. Tuning direction specifies what neurons are
most correlated with (e.g. hand velocity). Tuning depth
specifies how strong that correlation is. We seek to extend
this theory to BMIs in a closed-loop brain control
experiment where user and decoding algorithm are co-
adapting through experience. We hypothesize that through
skilled BMI use there will be a modification of the neural
representation of the user of the BMI. The tuning will be
This work was supported in part by the U.S. National Science Foundation
under Grant #CNS-0540304, the Children’s Miracle Network, and the UF
Alumni Association Fellowship.
B. Mahmoudi and J. DiGiovanna are with the Department of Biomedical
Engineering, University of Florida, 106 BME Building, Gainesville, FL
32611 USA (email: {babakm, jfd134}@ufl.edu)
J. C. Principe is with the Department of Electrical and Computer and
Biomedical Engineering, NEB 451, University of Florida, Gainesville, FL
32611 USA (e-mail: principe@cnel.ufl.edu)
J. C. Sanchez is with the Department of Pediatrics, Division of Neurology,
University of Florida, P.O. Box 100296, JHMHC, Gainesville, FL 32610
USA (e-mail: jcs77@ufl.edu)
used as a metric to quantify the changes in the neural
representation and how much has occurred.
Quantifying neural representation in a BMI through
directional tuning adds a new perspective compared to
traditional behavioral motor learning experiments [6, 7]. In
both situations, the user must learn how to control an
appendage (natural or artificial). However, in a BMI the
normal physiologic pathways are replaced by artificial
actuators (e.g. robot, CA) which complete the task. Since the
actuators are artificial, we have direct knowledge of all BMI
control signals throughout learning. However, it remains
unknown in the RLBMI how the normal neuronal activation
is remapped to the decoding system variables and new
environmental workspace. Since the decoding evolves
through experience, we have the opportunity to find causal
relationships. Specifically, we address the following
questions in the RLBMI context: Does neural tuning
direction change as the user learns to control the robot?
Does neuronal tuning depth change throughout learning?
II. METHODS
A. RLBMI Experiment
We briefly summarize the experimental paradigm, neural
data acquisition and the computational framework of the
RLBMI [1, 2]. In order to investigate the RLBMI, we
designed an operant conditioning experiment paradigm
where three male, Sprague Dawley rats were trained to
control a robotic arm and maneuver this arm to press levers,
in the robotic workspace (see Fig. 1 for overview).
The RLBMI decodes robot actions from the user’s
neuronal modulations. These modulations are recorded
bilaterally with 16 microwire electrodes chronically
implanted in each hemisphere of primary motor cortex (MI)
for a total of 32 electrodes. Single neuron action potentials
were detected and sorted using standard techniques [8] and
29 single units were discriminated for the rat in this study.
Surgical details are given in [9] and signal acquisition and
processing details in [1]. Modulations consist of the firing
rate of each discriminated neuron which was estimated in
non-overlapping 100 ms bins. We examine tuning in the
brain-control portion of this experiment and a time line of
this control is given in Fig. 2. Briefly, the rats were required
to control the robotic arm (see Fig. 1) using only neuronal
modulations form the single units we identified. A target
side was randomly selected and according to the selected
side the trial was labeled as left or right. Every 100 ms, the
CA must select which robot control action to take given the
Neuronal Tuning in a Brain-Machine Interface during
Reinforcement Learning
Babak Mahmoudi, Student Member, Jack DiGiovanna, Student Member, Jose C. Principe, Fellow, and
Justin C. Sanchez, Member, IEEE
B
30th Annual International IEEE EMBS Conference
Vancouver, British Columbia, Canada, August 20-24, 2008
978-1-4244-1815-2/08/$25.00 ©2008 IEEE. 4491