Copyright is held by the author / owner(s). SIGGRAPH Asia 2011, Hong Kong, China, December 12 – 15, 2011. ISBN 978-1-4503-0807-6/11/0012 Around the World in 80 Seconds Jean-Charles Bazin CGL, ETHZ, Switzerland Alexandre Richard UTC, France Yu-Wing Tai KAIST, South Korea Inso Kweon RCV lab, KAIST, South Korea Figure 1: Our system automatically generates a visually smooth image sequence “around the world in 80 seconds” from Internet images which summarize the cities visited in this tour. 1 Introduction and Related Work In the famous book “around the world in 80 days” published by Jules Verne in 1873, two characters aim to perform a world tour in 80 days. Inspired by Verne’s adventure, we develop a system to virtually circumnavigate the world in 80 seconds, by collecting and re-arranging a large collection of Internet images, in a fully automatic manner, as shown in Figure 1. Existing methods cannot handle our application. For example, R. Pergeaux and A. Proﬁt 1 need a considerable amount of time and efforts to manually select and align the pictures. Structure- from-motion algorithms (e.g. [Snavely et al. 2006]) can deal with thousands of images but would require an intractable amount of processing time and memory for world-scale scenes. In contrast to scene summarization (e.g. [Simon et al. 2007]), the scene and the physical locations of the images are changing during the tour. [Sivic et al. 2008] developed a system to navigate in a set of im- ages, but it does not take into account geographic data (e.g. GPS position), temporal data (e.g. acquisition date) or higher-level in- formation (e.g. objects/buildings present in the pictures or repre- sentative image selection). 2 Our approach We automatically collect millions of pictures from Flickr and grouped the images from the same city into one cluster. A special sub-cluster “street”, detected by [Oliva and Torralba 2001], is in- cluded within each city as a connector between landmarks or cities. Our goal is to build an image sequence with visually smooth transi- tion. This aim can be transformed into a graph problem which ﬁnds the shortest path from a node in the ﬁrst cluster to a node in the last cluster. We deﬁne the similarity between two images, I and I ′ by: d(I,I ′ )= fs(I,I ′ )+ αcfc(I,I ′ )+ αv fv (I,I ′ ) (1) where f s measures the structure similarity from GIST features [Oliva and Torralba 2001], f c is the χ 2 distance between the global color histograms (i.e. color/tone similarity), f v is the spatial dis- tance between the vanishing points [Kong et al. 2009], αc and αv are the relative weights which are set to 10 and 100 respectively. Given the shortest path as the skeleton path, the user might want to adjust the number of images in the sequence while maintaining the 1 http://www.youtube.com/watch?v=2N8NaUHR5XI smooth transition in the image sequence. This can be achieved by dynamically adding/dropping nodes from the skeleton path based on the edge weights in real-time. Our system can also, if desired, automatically ﬁnd the representa- tive images within each (sub)cluster. This selection is performed by [Simon et al. 2007] and will be presented in the Paris tour. 3 Experiments Figure 1 presents a subset of our result image sequence for “Jules Verne’s world tour” (world scale). The whole sequence contains 80 images and constitutes a 80-second world tour movie. Additional results and video sequences are presented on http://graphics.ethz.ch/ ˜ jebazin/WT80sec/, es- pecially a trans-US journey from New-York state to California (country scale), a tour through the 10 most popular landmarks of Paris (city scale) and a tour in Ueno Park in Japan along the four seasons (temporal tour). The whole image sequences were obtained within a minute, given the similarity graph. 4 Conclusion In this paper, we have presented an original system that generates, in a fully automatic manner, a visually smooth image sequence from Internet images to visualize a traveling tour. Our system al- lows real-time dynamic adjustment of the number of images pre- sented in the sequence, interactive path deﬁnition by the user and automatically selects the representative images of landmarks. We demonstrated our system with tours of various graphical conﬁgura- tions and scales and also extended it to the temporal domain. References KONG, H., AUDIBERT, J.-Y., AND PONCE, J. 2009. Vanishing point detection for road detection. In CVPR’09. OLIVA, A., AND TORRALBA, A. 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV’01. SIMON, I., SNAVELY, N., AND SEITZ, S. M. 2007. Scene sum- marization for online image collections. In ICCV’07. SIVIC, J., KANEVA, B., TORRALBA, A., AVIDAN, S., AND FREEMAN, W. T. 2008. Creating and exploring a large photo- realistic virtual space. In Workshop on Internet Vision (WIV’08). SNAVELY, N., SEITZ, S. M., AND SZELISKI , R. 2006. Photo tourism: Exploring photo collections in 3D. In SIGGRAPH’06.