Proceedings of the 9th Sound and Music Computing Conference, Copenhagen, Denmark, 11-14 July, 2012 EVALUATING HOW DIFFERENT VIDEO FEATURES INFLUENCE THE VISUAL QUALITY OF RESULTANT MOTIONGRAMS Alexander Refsum Jensenius University of Oslo, Department of Musicology, fourMs lab a.r.jensenius@imv.uio.no ABSTRACT Motiongrams are visual representations of human motion, generated from regular video recordings. This paper evalu- ates how different video features may inﬂuence the gener- ated motiongram: inversion, colour, ﬁltering, background, lighting, clothing, video size and compression. It is argued that the proposed motiongram implementation is capable of visualising the main motion features even with quite drastic changes in all of the above mentioned variables. 1. INTRODUCTION The last decade has seen a rapid growth of interest in study- ing music-related body motion [1–3]. Music-related mo- tion is here used to describe all types of body motion that appear in a musical context, including those carried out by performers (i.e. musicians, conductors, dancers) and per- ceivers (e.g. in concerts, discos, on the bus). This includes a large variety of motion types, all of which may also occur in any type of location, e.g. concert halls, clubs, at home, in the street, or on the bus. Having tools and methods for recording, visualising and analysing music-related motion are important for empirical music researchers. Various types of marker/sensor based motion capture systems excel at providing high quality data, which is useful for both qualitative and quantitative anal- yses. However, a major challenge with most such motion capture systems is that they require markers/sensors to be put on the body of the subject, something which makes them less ideal for recording, say, a musician in concert. Another problem with the data obtained from motion cap- ture systems, is that they are focused on capturing the posi- tion of markers, or possibly body joints, and may not cap- ture the global qualities of complex body motion satisfac- torily. Here regular video recordings excel, albeit with a trade-off in terms of lower resolution/speed as opposed to motion capture systems. All in all, I believe that a regular video recording is still among the most ﬂexible, cheapest and most accessible so- lutions to recording music-related motion. Extracting use- ful information from regular video recordings is a chal- lenge, however, and is often computationally heavy and Copyright: c 2012 A. R. Jensenius. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License , which per- mits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. motiongram motiongram Similarity matrix of audio spectrum motion history image 35s 35s Figure 1. From a video recording of a pianist perform- ing the opening of Beethoven’s Tempest Sonata. Motion history image (top left), horizontal motiongram (top right), vertical motiongram (bottom left) and a similarity matrix of the audio spectrum (bottom right). based on many assumptions of the content of the video ﬁle. As opposed to analysis-based visualisation techniques, mo- tiongrams is a simple and straightforward reduction-based approach to creating visual displays of continuous motion over time [4]. An example of how motiongrams may be used to study a performers’ motion can be seen in Figure 1. Here the horizontal and vertical motiongrams represent vertical and horizontal motion, respectively. Since two motiongrams are shown, a similarity matrix of the audio spectrum is used so that it is possible to compare motion to sound in both di- mensions. The vertical motiongram effectively visualises the phrasing in the transverse (horizontal) plane of the per- former, while the horizontal motiongram displays the con- tinuous attacks in the hands, as well as weight shifts in the legs, and the pedal activity of the right foot. Motiongrams have been used for visualising many differ- ent types of music-related motion over the years [5], and even in studies of young infants with the risk of developing cerebral palsy [6]. The technique has proven to be ﬂexible, scalable, and tolerable for changes in the input video ﬁles, but there has not yet been any systematic testing of the im- SM⒔2012-467