Video Inlays: A System for User-Friendly Matchmove Dmitry Rudoy dmitry.rudoy@gmail.com Technion, Israel Lihi Zelnik-Manor lihi@ee.technion.ac.il Technion, Israel Figure 1: Adding an artificial object to an existing video usually requires high-end tools and intensive user interaction. Our system allows to inlay any number of objects, a balcony and the wall lamps on the right, into any video via a simple user interface and minimal user interaction. This is achieved by representing the video structure and texture as a mosaic (center). Abstract Digital editing technology is highly popular as it enables to eas- ily change photos and add to them artificial objects. Conversely, video editing is still challenging and mainly left to the profession- als. Even basic video manipulations involve complicated software tools that are typically not adopted by the amateur user. In this pa- per we propose a system that allows an amateur user to performs a basic matchmove by adding an inlay to a video. Our system does not require any previous experience and relies on a simple user in- teraction. We allow adding 3D objects and volumetric textures to virtually any video. We demonstrate the method’s applicability on a variety of videos downloaded from the web. CR Categories: I.4.3 [Image Processing and Computer Vision]: Scene Analysis—Depth cues I.3.7 [Computer Graphics]: Three- Dimensional Graphics and Realism—Virtual reality; I.2.10 [Arti- ficial Intelligence]: Vision and Scene Understanding—Video anal- ysis; Keywords: video editing, matchmove, video representation 1 Introduction Video editing is composed of three main steps: sequencing, match- moving and compositing. Sequencing includes managing the tem- poral dimension of a video, namely, rearranging scenes and modi- fying the time flow. Matchmoving refers to matching between the camera movement and the motion of an artificial object, in order to place the object correctly in each frame. Finally, compositing takes care of the seamless composition of two, or more, sequences. In the world of professional video editing there exist plenty of sophisticated tools for each of the three tasks. For instance, se- quencing can be performed using Adobe R Premiere R or Apple’s Final Cut Pro R . In the high end production Boujou is the common matchmoving tool, and Adobe R After Effects R or Sony R Vegas TM are used for compositing. Unfortunately, these very expensive soft- wares require high user skills and intensive user interaction. In amateur video editing there is a lack of mathmoving and compositing tools. There are sequencing tools that are home user oriented, like Adobe R Premiere Elements R and CyberLink R PowerDirector R , but these are limited to basic effects, such as scene transitions and textual and image overlays. Although there are open-source tools like Blender.org, which is capable of match- moving, its easy of use for untrained users is questionable. To per- forms a basic camera motion modeling the user is required to ex- pert the tool. Furthermore, the level of user interaction is very high. Therefore amateurs rarely perform any video edits beyond scene arrangement and textual overlays. In this paper we propose a user-friendly system for amateur match- moving. We do not presume to compete with the professional tools for matchmoving, but rather seek to achieve acceptable per- formance with as simple user interaction as possible. The main contribution of the paper is a system that: allows adding multiple 3D objects or a volumetric texture to a video (see Figure 1), reduces the user interaction to a minimum by eliminating the need to check every video frame, enables adding an object or 3D texture with trivial interaction and, renders a basic composite video. To achieve the desired simplicity our system follows several steps. First, we represent the entire video as a single mosaic image - this