Abstract—In this paper, an autonomous vision-based system was developed to control a wheelchair mounted robotic manipulator (WMRM). Two 3D cameras were applied for object and body part recognition (face and hands) of the wheelchair user. Two human robot interface modalities were used to control the WMRM: voice and gesture recognition. Daily objects were automatically recognized by employing a two-step process: 1) using Histogram of Oriented Gradients (HOG) algorithm to extract the feature vector for each detected object; 2) applying nonlinear support vector machine (SVM) algorithm to train the model and classify the objects. Four simulated tasks for daily objects delivery and retrieval were designed to test the validity of the proposed system. The results demonstrated that the automatic control requires significantly fewer time than the predefined control for phone calling and photography tasks (P = 0.015, P = 0.035), respectively. The gesture modality outperforms the voice control for the drinking and phone calling tasks (P = 0.016, P = 0.015), respectively. I. INTRODUCTION The advancement of assistive robotics facilitates the development of wheelchair mounted robotic manipulators (WMRMs) for people with disabilities (PWDs). These WMRMs worked in close proximity to PWDs to assist with Activities of Daily Living (ADL), such as dressing, feeding, and objects retrieval and delivery. WMRMs improve the accessibility of surroundings for PWDs and enhance their independence [1]. Previous research has shown that a WMRM system is beneficial to individuals with mobility impairments, such as spinal cord injury (SCI) [2] and Cerebral Palsy [3]. An intelligent assistive robotic manipulator system named UCF- MANUS was developed by Kim et al. [4] for users with a wide range of disabilities. Essential to this system and other WMRMs is to integrate computer vision to recognize daily objects. For example, Fence et al. [5] applied a monocular camera for object recognition using scale invariant feature transform (SIFT) to control a 7-degree of freedom (DoF) robotic arm. The parts of the body of the operator were also recognized to assist in automating daily tasks. Tanaka et al. [6] developed an assistive WMRM to grasp a cup and bring it to the user’s mouth with the help of two cameras (one is used * This research is supported by the State of Indiana to the Center for Paralysis Research. H. Jiang, T. Zhang, and J. P. Wachs are with the School of Industrial Engineering, Purdue University, West Lafayette, IN 47907 USA (e-mail: jiang115@purdue.edu, zhan1013@purdue.edu, jpwachs@purdue.edu). B. S. Duerstock is with the Weldon School of Biomedical Engineering and School of Industrial Engineering, Purdue University, West Lafayette, IN 47906 USA (765-496-2364; e-mail: bsd@purdue.edu). to recognize the objects and the other is used to recognize the user’s face) in the robotic arm’s hand. Recently, the availability of commercial WMRMs has increased. For example, JACO robotic manipulator produced by Kinova® and Cyton Gamma 1500 [7] developed by ROBAI® (a lightweight 7-DoF robotic arm) are designed to be mounted on the wheelchair and help users with upper limb impairments with instrumental and basic ADLs [8]. Previous studies on human robot interfaces (HRIs) for robotic manipulator control were based on different input modalities. For instance, Pathirage et al., [9] developed a vision-based Brain Computer Interface (BCI) to grasp objects using a WMRM. The patients with tetraplegia were trained to voluntarily modulate electroencephalogram (EEG) signals to send commands to a WMRM. Three modalities were adopted in the WMRM system presented by Kim et al. including joystick, touchscreen, and BCI [4]. Other control modalities for the WMRM systems consist of speech recognition [10], head movement and facial expression [11], hand gestures [12], EEG signals [13], and a 3-D controller [14]. Our previous work consists of designing a gesture recognition-based interface for quadriplegic individuals due to SCI [12] and developing a prototype vision-based WMRM system (with manual and semi-automatic control mode) combining hand gestural control and automatic user face and object detection for quickly retrieving everyday objects for use [15]. The drawback of the previous developed system was that it required the user to manually control the robotic arm to perform fine movements when grasping an object. In this paper, we extend the functionality and robustness of the object recognition algorithm, the HRI modalities we test, and the robotic control policy used. Integrated computer vision algorithms are applied to detect, recognition, and grasp objects automatically. The human body parts (face and hands) are tracked to facilitate objects positioning. Additionally, approximation signals from the smartphone are used to provide feedback for the users’ safety and tasks’ efficiency. Moreover, the system was tested with more complex, multistep tasks to simulate real-world needs. II. SYSTEM ARCHITECTURE The architecture of this prototype system is illustrated in Fig. 1. The computer vision-based WMRM system includes five modules: (A) user interface with gesture and speech control, (B) automatic object recognition, (C) human body part recognition, (D) object sensors, and (E) the robotic arm control module. Four multistep tasks were designed to test this system: drinking, phone calling, taking a self-portrait or ‘selfie’ photograph, and typical picture taking. Autonomous Performance of Multistep Activities with a Wheelchair Mounted Robotic Manipulator Using Body Dependent Positioning Hairong Jiang, Student Member, IEEE, Ting Zhang, Student Member, IEEE, Juan P. Wachs, Member, IEEE, and Bradley S. Duerstock * , Member, IEEE