A COMPUTER VISION SYSTEM FOR MONITORING MEDICATION INTAKE David Batz, Michael Batz, Niels da Vitoria Lobo, Mubarak Shah University of Central Florida da435238@ucf.edu, mi665063@ucf.edu, niels@cs.ucf.edu, shah@cs.ucf.edu Abstract We propose a computer vision system to assist a human user in the monitoring of their medication habits. This task must be accomplished without the knowledge of any pill lo- cations, as they are too small to track with a static cam- era, and are usually occluded. At the core of this process is a mixture of low-level, high-level, and heuristic techniques such as skin segmentation, face detection, template match- ing, and a novel approach to hand localization and occlu- sion handling. We discuss the approach taken towards this goal, along with the results of our testing phase. 1. INTRODUCTION Automatically detecting if complex tasks are being per- formed by humans often requires using multiple recogni- tion and tracking methods. In our case, we are tracking a user as they interact with medication bottles, using only one color camera. Speciﬁcally, we would like to know if the user opened a medicine bottle, placed their hand up to their mouth, and then closed the bottle. Our motivation for this program comes from the fact that some people on multiple medications have difﬁculty remembering which pills to take and when to take them. A vision system which can assist with the tracking of a spe- ciﬁc person’s medication habits would be useful. Such a system must check four requirements: 1. The right user is taking the medication. 2. The right medication is being taken. 3. The right dosage is being taken. 4. The medication is being taken at the right time. Our system concentrates on problem (2) and provides a framework for (3). Problem (1) is outside the scope of this paper, and problem (4) is relatively easy to solve and will not be discussed here. 1.1. Assumptions We assume there is one camera monitoring a medication area containing a number of medication bottles already in view. The inputs required by the system are the number of bottles in view, the bottle detection training data, and a skin color predicate, all of which are constructed ofﬂine. The caps of any medication bottles used must also require a twisting motion to be opened. Listed next are the assumptions made about the user. Only one user is close to the camera in the video sequence. There must be a short initial period of time when the user appears and their face is not occluded so it can be automat- ically initialized and tracked during future occlusions. For now, we assume the user places only one pill in their hand at a time, as tracking the pill is not possible. We can how- ever, look for some improper forms of usage, such as the pill bottle being brought up to the mouth, or the repetition of a hand moving between an open bottle and the mouth. In this paper we also present novel algorithms for hand localization and occlusion handling. Subsequent sections detail each step of the system, and then we present results. 1.2. Main Algorithm Below we list the main steps of the system: Load first frame of a sequence { Compute lighting correction values Automatically initialize bottle tracking templates Load skin color predicate } FOR (each consecutive frame) { Apply lighting correction and Gaussian filter Find skin regions using YCbCr predicate Apply morphological operations to skin regions Compute regional properties of skin regions Check for skin occlusions Repair any skin occlusions found Localize face Localize hands Track medication bottles Determine if any requirements are being met } 2. SKIN SEGMENTATION For each frame, two pre-processing steps are applied to maximize the effectiveness of future operations performed. The ﬁrst step consists of a lighting correction routine, which helps remove any color biasing. For this, we adopted a his- togram based method similar to the one described in [5]. Proceedings of the Second Canadian Conference on Computer and Robot Vision (CRV’05) 0-7695-2319-6/05 $ 20.00 IEEE