A Field-Validated Architecture for the Collection of Health-Relevant Behavioural Data Dylan L. Knowles Department of Computer Science University of Saskatchewan Saskatoon, Saskatchewan, Canada dylan.knowles@usask.ca Kevin G. Stanley Department of Computer Science University of Saskatchewan Saskatoon, Saskatchewan, Canada kevin.stanley@usask.ca Nathaniel D. Osgood Department of Computer Science University of Saskatchewan Saskatoon, Saskatchewan, Canada nathaniel.osgood@usask.ca Abstract—Human behaviour is an underlying factor in many diseases. Behavioural data has traditionally been collected through interviews, surveys, and direct observation. While these methods offer significant insight, they have drawbacks including bias, limited recall accuracy, and low temporal fidelity. Automated data collection devices such as GPS trackers have helped to reduce these problems while increasing objectivity and fidelity. Modern smartphones provide sensors that can replicate the functionality of dedicated devices while providing ubiquity, near-perpetual presence, and the ability to perform ecological momentary assessment. This has spurred researchers to envision or deploy smartphone data collection tools. Not all of these tools, however, are well designed, thoroughly tested, or easily extended. To realize the potential of this technology in the health sphere, careful attention must therefore be paid to the underlying software architecture and its robustness. To this end, we present a highly flexible, reconfigurable, and verifiable software architecture for monitoring health-related behaviours constructed using modern software engineering principles. We detail here the process-stream abstractions that underlie its data collection and management processes. Efficacy is demonstrated through retrospective analysis of deployments of the system, which include targets as diverse as studying flu transmission and gamified interventions for sedentary behaviour. I. I NTRODUCTION Medical advances, new treatment methodologies, and novel health interventions often result from an increased understand- ing of human physiology, pathology, and behaviour. However, treatments and interventions targeting human behaviour, such as how the dynamics of human contact contributes to the spread of disease [1], are still imperfectly understood. The capacity to effectively capture, quantify, and analyze human health behaviour could radically change how we understand, design and deliver interventions. Diaries and surveys have traditionally been used [2]–[4] to glean insight into the daily activities, motivations, and thoughts of individuals. While these tools generate important insight, they have significant drawbacks [2]. Flawed and biased recall or recording by both participants and experimenters can skew results; adherence to study protocol may decay with time; and behavioural distortions induced by knowledge of monitoring or power relationships between observer and observed [5] can all degrade data quality. Given the frequency with which these limitations are noted within the medical, sociologal, public health, psychological, and even geographical [3] literature, the development of a solution seems desirable. Special-purpose sensor hardware, once employed in the study of structural integrity, animal populations [6], and en- vironmental phenomena [7], have been adapted for use in human research [8], [9]. These hardware systems can monitor individuals with minimal user input, substantially increasing the temporal fidelity of data. These devices, however, do not address all problems faced by researchers. Sensor hardware can be left behind, fail [6], be easily destroyed [4], [8], and can be obtrusive and disrupt normal participant behaviour [3]. Some hardware solutions can be quite expensive [4]. Leveraging existing electronic monitoring systems already carried by individuals would alleviate many of the drawbacks of custom solutions while offering the benefits of continuous and low-maintenance data collection. Smartphones can be used to capture rich, epidemiologically-relevant data on their owners and populations [2], [5], [10], [11]. Between 47% and 55% of North Americans own these devices, meaning that there is a substantial population in possession of hardware with experimentally useful sensing capacity [12], [13]. Smartphones have the added benefits of allowing questionnaires and on- screen forms to be issued [5], [10], [11]. The potential of smartphone-based behavioural monitoring has prompted many individuals to explore the field [5], [10], [11], [14]–[16]. Few of these efforts, however, report moving beyond a prototype stage, see use in studies other than those for which they were originally designed, or have been validated by external parties. Proof-of-concept systems are typically difficult to extend, modify, and generalize and are prone to bugs [17]. Tools without clear, concrete plans for expansion are difficult to apply outside of their intended domains. Existing systems with plans for long-term use [5], [14] have either only reported use and evaluation within-group [14] or do not appear to support the sort of perpetual, ubiquitous data collection that many studies need [5]. The proliferation of systems presents many attractive ideas, but there is a lack of contributed, real-world examples where they have been used. While some organizations aim to reduce development barriers in the health sphere [18], there appears to be no example of a standardized, well-travelled approach for health studies with fine-grained data collection needs. We have attempted to address these problems of extensi- bility, generality, and verifiability through the introduction, use and continual development of a system named iEpi [10], [11] based on a core architecture that facilitates easy extension, testing, and subcomponent reuse. Over the past four years we