Exp Brain Res (2011) 214:131–137 DOI 10.1007/s00221-011-2812-y 123 RESEARCH ARTICLE Task relevance predicts gaze in videos of real moving scenes Christina J. Howard · Iain D. Gilchrist · Tom Troscianko · Ardhendu Behera · David C. Hogg Received: 7 February 2011 / Accepted: 23 July 2011 / Published online: 6 August 2011 Springer-Verlag 2011 Abstract Low-level stimulus salience and task relevance together determine the human Wxation priority assigned to scene locations (Fecteau and Munoz in Trends Cogn Sci 10(8):382–390, 2006). However, surprisingly little is known about the contribution of task relevance to eye movements during real-world visual search where stimuli are in constant motion and where the ‘target’ for the visual search is abstract and semantic in nature. Here, we investi- gate this issue when participants continuously search an array of four closed-circuit television (CCTV) screens for suspicious events. We recorded eye movements whilst par- ticipants watched real CCTV footage and moved a joystick to continuously indicate perceived suspiciousness. We Wnd that when multiple areas of a display compete for attention, gaze is allocated according to relative levels of reported suspiciousness. Furthermore, this measure of task relevance accounted for twice the amount of variance in gaze likeli- hood as the amount of low-level visual changes over time in the video stimuli. Keywords Visual search · Scene perception · Eye movements · Attention Introduction In daily life, we are often presented with scenes containing semantically distinct regions. In the oYce environment, workers are frequently presented with a number of software interfaces containing several independent streams of infor- mation competing for attention. In the street, traYc events may be causally and semantically independent from events and information in nearby shops, bus or tram stops, oYces and advertising. Whilst driving, we are confronted with the external street view and also an array of controls, naviga- tion and communication devices. In these contexts, how do we search the world for semantically deWned events? For example, in a crowd, we may want to identify the person who is likely to approach us, at a children’s party, we may want to detect accidents and in the jungle, we will deWnitely want to monitor for the approach of a wide array of preda- tors. In all these cases, the target event is not deWned by a strict set of visual characteristics but instead requires high- level evaluation of a changing scene. In fact, target events in these cases are not necessarily directly associated with any strict set of image characteristics at all. Rather, they tap into much higher-level meaning-based evaluations of scene events. Target events may be localised to a small section of the video image or they may relate to several locations at once, for example the perceived intentions of one person towards another in spatially distant parts of the scene. In addition, the target and search environment are dynamic and the target event will unfold over time. To what extent will overt attention (eye movements) be guided by high- level information in these kinds of scenes? There has been much interest in predicting eye move- ments based on low-level characteristics of visual stimuli. Itti and Koch (2000) proposed a saliency model that pre- dicts attention allocation on the basis of colour, orientation C. J. Howard (&) · I. D. Gilchrist · T. Troscianko Department of Experimental Psychology, University of Bristol, Bristol, UK e-mail: christina.howard@ntu.ac.uk C. J. Howard Psychology Division, Nottingham Trent University, Nottingham, UK A. Behera · D. C. Hogg School of Computing, University of Leeds, Leeds, UK