International Journal of Computer Applications (0975 8887) Volume 179 No.35, April 2018 6 Object Detection and their Localization for Visually Impaired Users Twinkle Motwani, Sejal Gianani, Abhishek Mehta, Rohan Shende Computer Engineering, V.E.S.I.T. Mumbai, India Sharmila Sengupta Professor, Computer Engineering Dept, V.E.S.I.T. Mumbai, India ABSTRACT Visual impairment is the functional loss of the eye or eyes in which a person's eyesight cannot be corrected up to a normal level. Visually impaired people face many problems in their day to day life and the most common inconvenience is to find their personal items which are misplaced in their indoor space. With advancement in technology, normal conduction of tasks can be made possible for such cases. This paper describes a system which is a user friendly human computer interface enhanced with computer vision technology which will be able to support the visually impaired to localize and pick up objects used in their daily life. Several efforts have been taken in the past few years to improve the quality of life of such people. The aim of our system is to bridge the gap between what the disabled person wants to do and what the existing social infrastructure allows them to do. Keywords Visually impaired; Object detection; Object positioning; Deep Learning. 1. INTRODUCTION According to the World Health Organization, 253 million people live with visual impairment, 36 million people are blind, 217 people have moderate to severe vision impairment. Computer vision can be delineated as the field of science and artificial intelligence for building artificial systems that obtain knowledge from images or multi-dimensional data and give computers a visual understanding of the world. Some of the latest technologies developed for the blind include - smart glasses [1] that can read and recognize faces, Finger reader, Blind reader, Co-robotic cane- GPS based assistive device [2] etc. These existing systems succeed in object recognition, navigation [3] but haven’t achieved much in determining the exact position of an object so that the user grabs it. We have arrived at a method for determining the position of an object with respect to users hand. This method guides the user to move his hand and reach the object. It uses combined methods like deep learning based object detection, neural networks, image processing techniques (feature extraction, morphology). The project helps visually impaired people by increasing their confidence level and making them more independent. The relevance of this system plays a vital role in the society. It can be used in blind training institutes, non - profit organizations and any indoor environments like home, office, schools. 2. METHODOLOGY 2.1. System Design There are several modules in the system design which are depicted in Figure 1. First we have the user with the device i.e. an integrated hardware unit which is developed. The image is captured via live video stream through a camera module. The video is divided into frames which are resized and these frames are converted into BLOBs which is a collection of binary data stored as a single entity in a database management system. This data is then given to the deep learning neural network. Deep learning neural networks are different from single hidden layer neural networks because of their depth i.e the number of node layers through which data passes in a process of multiple steps. Previously neural networks had one input and one output layer, and at most one hidden layer in between. More than three layers including input and output is considered as deep learning. This is followed by the object detection module using OpenCV which detects all objects included in our dataset. Approaching further, we have the object mapping and positioning module. After detection of objects the system will assist the user what objects are there in front of him. The user via voice commands tell the system which object he wants to grab, his speech is converted into text. Next the required object is mapped with the real time input and it is the only object that is shown in the frame. Then distance calculations are done and the angle through which the user should move his hand to grab the object is computed. As our system is designed for objects within a vicinity of 1 meter, the system checks if the object is within an arm's length (1 meter) and accordingly text to speech conversion takes place. The system then guides the user by giving appropriate instructions via audio commands (i.e. angle and distance) for grabbing the object.