UNSUPERVISED MULTIMODAL PROCESSING Abel Nyamapfene 1 and Khurshid Ahmad 2 1 School of Engineering, Computer Science and Mathematics Harrison Building, North Park Road University of Exeter EXETER EX4 4QF United Kingdom Email: a.nyamapfene@ex.ac.uk 2 Department of Computer Science O'Reilly Institute, Trinity College, Dublin 2 Ireland Email: kahmad@tcd.ie ABSTRACT We present two separate algorithms for unsupervised multimodal processing. Our first proposal, the single- pass Hebbian linked self-organising map network, significantly reduces the training of Hebbian-linked self- organising maps by computing in a single epoch the weights of the links associating the separate modal maps. Our second proposal, based on the counterpropagation network algorithm, implements multimodal processing on a single self-organising map, thereby eliminating the network complexity associated with Hebbian linked self organising maps. When assessed on two bimodal datasets, an audio-acoustic speech utterance dataset and a phonological-semantics child utterance dataset, both approaches achieve smaller computation times and lower crossmodal mean squared errors than traditional Hebbian linked self-organising maps. In addition, the modified counterpropagation network leads to higher crossmodal classification percentages than either of the two Hebbian- linked self-organising map approaches. KEY WORDS Hebbian-linked self-organising maps, Multimodal, Crossmodal, Neural Networks 1 Introduction The Hebbian-linked self-organising map architecture [1][2] was originally developed by Miikkulainen as a computational model of the human lexical system at the level of physical structure (i.e. brain topographic maps and pathways), but its use has since extended to unsupervised multimodal processing in diverse applications ranging from computational models of cognitive processing [1][2][3][4][5] to multimedia information processing and storage [6]. In the Hebbian-linked self-organising maps architecture, each of the self-organising maps is trained to self-organise on one modal representation of the input patterns, and the Hebbian links encode the co-occurrence information between the two modal representations, thereby forming a crossmodal representation of the whole data set. Training of the Hebbian links proceeds concurrently with the training of the self-organising maps. In this paper, we shall refer to this form of training as in-situ training, and we shall refer to the networks so trained as in situ Hebbian-linked self-organising maps. Whilst the use of separate modal maps linked to each other by Hebbian link networks is justifiable in modelling multimodal cognitive phenomena, this approach easily leads to network implementations that get more unwieldy as the number of modes to be encoded increases. This is undesirable in most other applications, such as unsupervised multimedia data processing, where the goal is to implement multimodal data processing as efficiently as possible. To address this problem, we propose two independent solutions. Our first proposal, which we shall refer to as the single pass Hebbian linked self-organising maps SOMs) approach, reduces the computational expenses associated with the Hebbian link weights by updating them in only a single epoch of training after the modal maps have been separately trained. Our second proposal, based on the counterpropagation network training algorithm [7][8], implements crossmodal mapping using only one self-organising map. The rest of the paper is organized as follows: In Section 2 we discuss the training algorithm for the in situ Hebbian- linked SOMs architecture, and then, using this as a basis, we develop the single pass algorithm for training Hebbian-linked SOMs and the modified counterpropagation algorithm. In Section 3 we discuss the experimental procedures we use to compare the two 549-087 14