Abstract—As the amount of multimedia data grows larger and multi-modal information is widespread, requirements for methods are increasing which can be used to analyze composite information and retrieve related items of one modality based on another modality. For contents-based retrieval, models that incorporate multiple modalities concurrently are recognized as a mandatory approach. In this study, we propose a method to reconstruct and retrieve images based on text-to-image cross-modal recall by hypernetworks. In our method, a probabilistic graphical model called hypernetwork learns the pattern of relations between text keywords and images and images related to given keywords are reconstructed based on the learned relation patterns. Then, original images are retrieved based on similarities which are evaluated between reconstructed images and original ones. Experimental results on Korean magazine articles show that when text keywords are given as a query, the original images related to the keywords are retrieved. In addition, the results show when both image patches and text keywords are given, the images are reconstructed more precisely. Index Terms— Image reconstruction, pattern matching, pattern recognition, text processing I. INTRODUCTION CROSS-MODAL learning means a methodology based on transition from one modality to another one. That is, we can refer it to cross-modal that given auditory information, the data is converted to visual information with identical or similar contents. Cross-modal data retrieval and generation method is important with respect to both application and cognitive science. First, cross-modal techniques are applied to multimedia data mining [1] [2]. For example, ‘text-to-image’ can be applied to contents based image search and ‘image-to-text’ can be used to auto-tagging. Also, with respect to cognitive science, cross-modal retrieval is a trial to imitate the perception and cognition related to multi-modality in the This work was supported in part by IITA through the IT R&D program (IITA-2009-A1100-0901-1639, MARS), in part by KRF grant funded by Korean Government (MOEHRD) (KRF-2008-314-D00377), and in part by the BK21-IT program funded by Korean Government (MEST). brain [3] [4]. The contents in articles have their own subjects and the subjects are usually represented with more than one modality such as sentences and images. Text words and images related to the subject are used in the articles so as the content represent its subject with consistency. Therefore we can assume that there exist relations between words and images used in an article. Moreover, we can try to find a mapping method to convert one modality to the other based on the relation information. That is, when linguistic keywords are given as a query, we can acquire images related to the given keywords. Since keywords and images are represented with a large number of features, the cross-modal retrieval requires a generative model which can represent relations among high-dimensional features. In this study, we use the hypernetwork model [5] as a generative model. The hypernetwork is a weighted hypergraph where evolutionary methods are embedded as learning strategies. In this study, we make use of Korean magazine articles as experimental data and we generate images based on text-to-image cross-modal reconstruction using hypernetworks. In addition, we introduce a similarity measure to evaluate the difference between reconstructed images and original images to retrieve the most similar images. Experimental results show when the text keywords are given, the images are reconstructed and we can retrieve the original images related to keywords. Also, when partial images are given as query together, the reconstructed images are more recognizable. The rest of this paper is organized as follows. In Section II, backgrounds for this study are summarized and we present the method of cross-modal text-to-image reconstruction with hypernetworks in Section III. In Section IV, experimental results are presented. Finally, we conclude with summary and future works in Section V. II. BACKGROUNDS A. Cross-modal Learning The term of cross-modal was from cross-modal perception, cross-modal integration, or cross-modal plasticity in brain and Text-to-Image Cross-Modal Retrieval of Magazine Articles Based on Higher-order Pattern Recall by Hypernetworks Jung-Woo Ha, Byoung-Hee Kim, Hyun-Woo Kim, Woongchang Yoon, Jae-Hong Eom, and Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University Seoul, Korea Email: { jwha, bhkim, hwkim, wcyoon, jheom, btzhang } @bi.snu.ac.kr The 10th International Symposium on Advanced Intelligent Systems (ISIS 2009) 274