FATHOM: A Neural Network-based Non-verbal Human Comprehension Detection System for Learning Environments Fiona. J. Buckingham, Keeley A. Crockett, Zuhair A. Bandar, James D. O’Shea School of Computing, Mathematics and Digital Technology Manchester Metropolitan University Chester Street, Manchester, M1 5GD, UK fiona.j.buckingham@stu.mmu.ac.uk Abstract— This paper presents the application of FATHOM, a computerised non-verbal comprehension detection system, to distinguish participant comprehension levels in an interactive tutorial. FATHOM detects high and low levels of human comprehension by concurrently tracking multiple non-verbal behaviours using artificial neural networks. Presently, human comprehension is predominantly monitored from written and spoken language. Therefore, a large niche exists for exploring human comprehension detection from a non-verbal behavioral perspective using artificially intelligent computational models such as neural networks. In this paper, FATHOM was applied to a video-recorded exploratory study containing a learning task designed to elicit high and low comprehension states from the learner. The learning task comprised of watching a video on termites, suitable for the general public and an interview led question and answer session. This paper describes how FATHOM’s comprehension classifier artificial neural network was trained and validated in comprehension detection using the standard backpropagation algorithm. The results show that high and low comprehension states can be detected from learner’s non-verbal behavioural cues with testing classification accuracies above 76%. Keywords—artificial neural networks; backpropagation; comprehension; FATHOM; non-verbal behaviour I. INTRODUCTION Non-verbal behaviour is a form of non-linguistic communication that automatically accompanies verbal conversation. Gestures, facial expressions, and body movement are all examples of non-verbal behaviour [1]. Little work has been done on automatic comprehension detection, yet humans exhibit non-verbal cues consistently while undertaking day-to- day tasks. Thus, the research presented in this paper seeks to examine whether patterns of comprehension and non- comprehension exist within non-verbal behavioural cues. Previous classroom studies [2-7] have identified non-verbal behavioural indicators of non-comprehension, including facial behaviour, hand and body movements. However, this work has largely relied on subjective human coding [8] with associated inconsistency and upon verbal techniques. Thus there is a role for a non-verbal multichannel, comprehension detection system capable of reliably classifying human comprehension through facial non-verbal behaviour. Comprehension is often associated with written language [9] and is often defined as “the process of simultaneously extracting and constructing meaning through interaction and involvement with written language” [9]. In this research, we define comprehension as the learner demonstrating through interaction with a tutorial, (via verbal communication and/or non-verbal behaviour), that they understand or grasp the meaning of the tutorial material presented to them at a given point in time. The tutorial in this paper (described in Section V) comprised of each participant watching a factual video and participating in a question and answer (Q&A) session immediately after. FATHOM [10], is an artificial neural networks (ANN) based system developed specifically to detect levels of comprehension. FATHOM was developed based around an existing physiological profiling system known as Silent Talker [11] and was first trialled during an informed consent assessment process carried out in North-western Tanzania, Africa using a setting similar to that used for a Human Immunodeficiency Virus (HIV)/Acquired Immunodeficiency Syndrome (AIDS) prevention randomized study [10]. The work produced strong evidence [10] that detectable patterns of comprehension and miscomprehension exist within the monitored facial non-verbal multichannels, for the sample of African women with a limited set of non-verbal behavioural features. Initial observations provide grounds to suspect that there will be more, less obvious, micro gestures available for classification. The aim of the research presented in this paper is to apply FATHOM as a comprehension detection system to a learning task designed to distinguish high and low comprehension states from the learner based on facial non-verbal cues. In order to assess FATHOM’s ability, a new exploratory study was designed to capture comprehension levels of adults over the age of 18. The motivation of this work is to ultimately link FATHOM to pedagogical intervention in learner-adaptive online teaching and learning tutorials that could be delivered in 24/7 scenarios to improve the overall learning experience.