Recognizing Daily Activities in Realistic Environments through Depth-Based User Tracking and Hidden Conditional Random Fields for MCI/AD Support Dimitris Giakoumis 1 , Georgios Stavropoulos 1, 2 , Dimitrios Kikidis 1 , Manolis Vasileiadis 1 , Konstantinos Votis 1 , and Dimitrios Tzovaras 1 1 Information Technologies Institute, CERTH, Thessaloniki, Greece 2 University of Patras, Patras, Greece Abstract. This paper presents a novel framework for the automatic recognition of Activities of Daily Living (ADLs), such as cooking, eat- ing, dishwashing and watching TV, based on depth video processing and Hidden Conditional Random Fields (HCRFs). Depth video is provided by low-cost RGB-D sensors unobtrusively installed in the house. The user’s location, posture, as well as point cloud -based features related to gestures are extracted; a standing/sitting posture detector, as well as novel features expressing head and hand gestures are introduced herein. To model the target activities, we employed discriminative HCRFs and compared them to HMMs. Through experimental evaluation, HCRFs outperformed HMMs in location trajectories-based ADL detection. By fusing trajectories data with posture and the proposed gesture features, ADL detection performance was found to further improve, leading to recognition rates at the level of 90.5% for ﬁve target activities in a nat- uralistic home environment. Keywords: ADL recognition, user location trajectories, posture, ges- tures, point-cloud features, hidden conditional random ﬁelds 1 Introduction Automatic domestic activity recognition is a signiﬁcant challenge, toward future homes equipped with robotic applications capable to monitor the resident’s be- haviour, identify abnormalities and assist in the establishment of daily activities [8]. This is of particular importance for cases of Mild Cognitive Impairments (MCI) or Alzheimer’s Disease (AD), whereas activity monitoring can facilitate early diagnosis of cognitive decline [2]. Typically, the recognition of Activities of Daily Living (ADLs) [12] such as cooking, eating, dishwashing, has been ap- proached through ambient sensors [3] monitoring the house environment [23], as well as locations visited from the monitored person [13][9]. During the last years, relevant research eﬀorts have focused on RGB video processing [6][5][18][31] or, especially after the emergence of the Kinect sensor, on RGB-D images [30][4].