A System to Capture, Share and Access Personal Memories Rui M. Jesus 1,2 , Tiago Martins 2 , Rute Frias 2 , Arnaldo J. Abrantes 1 , Nuno Correia 2 1 Multimedia and Machine Learning Group, Instituto Superior de Engenharia de Lisboa 2 Interactive Multimedia Group, DI/FCT, New University of Lisbon Quinta da Torre, 2825 Monte da Caparica, Portugal http://img.di.fct.unl.pt rjesus@deetc.isel.ipl.pt, nmc@di.fct.unl.pt ABSTRACT This paper presents a system to access personal memories composed by digital pictures. The system consists of a retrieval engine, a desktop interface to share personal memories and a mobile user interface that allows capture and automatic annotation of images. The retrieval engine uses Global Positioning System (GPS) location data, low-level visual features and semantic concepts previously trained to retrieve images. With the mobile interface people can capture, share personal pictures and navigate in the physical space when visiting historical sites, museums and other touristic activities using their Personal Digital Assistants (PDA). The visitors can take photos and submit them to the system to receive contextually related photos taken by others or themselves. Experimental results are presented to show the performance of the retrieval mechanisms and the usefulness of the navigation system. Keywords: Personal Memories, User Interfaces, Multimedia Information Retrieval. 1. INTRODUCTION Sharing memories and experiences is a major activity of human beings that has been practiced for thousands of years, enabling the exchange of knowledge and information across generations and cultures. Pictures and videos are rich vehicles to transport this information. Recent technological advances of the digital devices have been changing the concepts of capture, share and store of visual information (e.g., camera-phones with integrated GPS and considerable storage capacity). People can take photos or make small video clips of everything, everywhere and, through the World Wide Web share this information with friends. Consequently, a vast amount of personal digital information is being produced and stored. Stored personal information can play an important role in supporting daily activities. This information can help a doctor in the diagnosis process [10] or people remembering important things about their lives (e.g., meetings, trips and holidays). Some applications (e.g., to find lost objects) requires continuous or passive capture of personal photos or videos [3, 4, 8], but other applications related with personal memories do not need this kind of capture, in particular the applications related with leisure activities like holidays, visiting a museum or a birthday party. Sometimes people want to recall a previous moment or the people they were with while at a given site. This is usually done in a Personal Computer (PC), but when visiting historical sites, museums and doing other touristic activities people may want to share their pictures or to browse for previous photos of the same place, possibly captured by other people. Additionally, these images taken by others can help in choosing the path for a given visit. In both situations, an efficiently multimedia information retrieval is essential in order to access personal information. Several applications and interfaces for accessing this information in a PC have been proposed. Commercial applications (e.g., Adobe Photoshop Album, Paint Shop Pro and Picasa) and online sites like www.flickr.com or www.phlog.net are available to manage personal memories with pictures. Most of these applications use manual annotations to search photos. The manual annotation is the most effective way but it is time consuming. Ethnographic studies [2] show that people usually do nothing to organize their photographs and only occasionally they create an album of a special event. Automatic systems rely on context metadata or in visual content [1, 6, 9]. Most of these systems use Content Based Image Retrieval (CBIR) techniques combined with context metadata in order to bridge the semantic gap [11], a usual problem in the CBIR systems [7] due to its difficulties in capture semantic concepts e.g., flowers or persons, using the low-level features extracted from the images. This work, described in more detail in the next sections, is a system to capture, share and access personal memories composed by digital pictures. PhotoNav consists of two user interfaces, a desktop interface and a mobile user interface. Both interfaces are based on a retrieval system that uses visual content and context metadata. 2. SYSTEM OVERVIEW PhotoNav is based on a desktop interface, in a mobile interface and in a retrieval system that runs in a server. The retrieval system uses Global Positioning System (GPS) location data, low-level visual features and semantic concepts previously trained to retrieve images. All the information and associated metadata is stored in a server that also does the necessary image processing for finding similar images (retrieval system). The desktop interface allows searching and sharing large image databases using semantic concepts. Semantic concepts are obtained by training binary classifiers (indoor/outdoor) using the Regularized Least Squares Classifier (RLSC) and can be combined to express more complex concepts. To combine several generic concepts the sigmoid function is applied to the output of RLSC. The images are ranked according to their probability of belong to the classes of the chosen concepts. This method was evaluated for several concepts suitable for personal memories (e.g., outdoor, people, nature) in our previous work [5] with good results. The mobile application is based on an interface where the user can capture pictures and formulate several types of queries to retrieve photos from the server. The personal information returned can help the user navigation in the physical space (e.g., when visiting historical site or museums). This application uses low-level features, semantic concepts and context metadata to query the database. It support three types of queries: 1. Query by image - visitors want to see images similar to the query, which can be a picture taken by them. Low-level