Int. J. Man-Machine Studies (1979) 11, 717-728 Perceptual and memory factors in simulated machine-aided speaker verification MARK HAGGARDt AND QUENTIN SUMMERFIELDt Department of Psychology, Queen's University, Belfast (Received 8 September 1978, and in revised form 4 December 1978) Speaker verification by machine alone may be more accurate than by human listener but it is slower and demands powerful programs and peripherals. Simple recording devices can juxtapose a claimant utterance with a stored sample to provide rapid verification by human judgement, but this raises the question of how to optimize the sample size between insufficient information and an overload of auditory memory. To identify the processes at work in such judgements, a simulation was conducted of the situation where a human operator verifies claimant speakers against stored samples of a standard utterance. Realism was incorporated by restricting signals to telephone frequency bandwidth while both control and a stringent level of difficulty were incorporated by the selection of 5 better than average imposters and five more than averagely imitable male speakers. Naive, unselected listeners participated. With a 9-syllable sentence lasting about 2 seconds, correct acceptances varied from 92% to 100% and false acceptances from 54% to 21%. Conditions in which the length of the sample was reduced in various ways gave lower performance. The major factor differentiating the performance of individual subjects was a bias factor--the degree to which "same" responses pre- dominated over "different" responses. Despite this, the different sample conditions tended to produce a fixed percentage of acceptance responses rather than a proportion varying with the available sensitivity in the fashion of an optimal decision-maker. The data justify several conclusions. (1) Listeners can integrate speaker information over periods as long as 2 seconds and probably longer. (2) Improvement in performance can result from increasing the length of either the claimant utterance or the stored sample even when the other cannot be increased. Thus it appears that listeners are extracting and storing parameters characterising the style of a speaker rather than matching a raw sound image. (3) Speaker verification by skilled listeners should be able to reach levels of sensitivity which, in combination with manipulations of the acceptance criterion, would ensure tolerably low false acceptance rates. (4) Training of the listener in speaker verification should involve training of acceptance criteria as well as perceptual dis- crimination training. Introduction Where artificial intelligence has progressed insufficiently to make it acceptable, economic or otherwise desirable to replace human functions by machine functions, it is frequently suggested that man and machine should interact, each bringing their special powers to bear on different aspects of a problem, e.g. the heuristic and algorithmic respectively. Where the task is a perceptual rather than cognitive one this compromise is limited to relatively simple automatic aids such as rapid file search and display because human perceptual judgements can only be used to the full when the stimuli for them are adequate facsimilies of objects and events in the real world. The foci of t Now at M.R.C. Institute of Hearing Research, University of Nottingham, Nottingham NG7 2RD, U.K. 717 0020-7373/79/060717+ 12 $02.00/0 (~) 1979 Academic Press Inc. (London) Limited