A Lightweight and Efficient Mechanism for Fixing the Synchronization of Misaligned Subtitle Documents Rodrigo Laiola Guimarães IBM Research Rua Tutóia 1157 04007900 São Paulo, Brazil +55 11 2132 2283 rlaiola@br.ibm.com Priscilla Avegliano IBM Research Rua Tutóia 1157 04007900 São Paulo, Brazil +55 11 2132 5790 pba@br.ibm.com Lucas C. Villa Real IBM Research Rua Tutóia 1157 04007900 São Paulo, Brazil +55 11 2132 4548 lucasvr@br.ibm.com ABSTRACT Online subtitle databases allow users to easily find subtitle documents in multiple languages for thousands of films and TV series episodes. However, getting the subtitle document that gives satisfactory synchronization on the first attempt is like hitting the jackpot. The truth is that this process often involves a lot of trial- and-error because multiple versions of subtitle documents have distinct synchronization references, given that they are targeted at variations of the same audiovisual content. Building on our previous efforts to address this problem, in this paper we formalize and validate a two-phase subtitle synchronization framework. The benefit over current approaches lays in the usage of audio fingerprint annotations generated from the base audio signal as second-level synchronization anchors. This way, we allow the media player to dynamically fix during playback the most common cases of subtitle synchronization misalignment that compromise users’ watching experience. Results from our evaluation process indicate that our framework has minimal impact on existing subtitle documents and formats as well as on the playback performance. CCS Concepts • Information systems Multimedia information systems, Speech / audio search • Applied computing Document management and text processing, Document metadata, Document preparation, Annotation, Format and notation, Multi / mixed media creation. Keywords Subtitles; Audio fingerprinting; Synchronization; SRT. 1. INTRODUCTION Downloading a subtitle document from the Internet and playing it alongside audiovisual content (e.g., a movie or a TV series episode) is not rocket science; but it sure can feel that way sometimes. Considering a user already has the media file on his or her local device and that s/he has identified multiple versions of potential subtitle documents on an online repository, s/he still has to figure out which of such files gives satisfactory synchronization. The problem is that even with the efforts of online communities to review and correct user-contributed subtitle documents as well as media players that try to download suitable subtitle documents automatically, the user may still run endless times into versions that do not sync up perfectly with the base audiovisual content. The underlying problem is that even if the synchronization is off for just a couple of seconds, misaligned subtitle entries will most probably be a constant annoyance. Take Figure 1 as an example. Here, we illustrate the playback of 2 subtitle documents with the corresponding audiovisual content (track in light blue with dot pattern). Figure 1.a represents the ideal scenario where subtitle entries (in yellow with line pattern) are perfectly synchronized with the base content. On the other hand, in Figure 1.b the timing of all subtitle entries (in orange with line pattern) are shifted t seconds. Note that this latter Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. DocEng '16, September 12 - 16, 2016, Vienna, Austria. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-4438-8/16/09…$15.00 DOI: http://dx.doi.org/10.1145/2960811.2960812 (a) (b) Figure 1. Playback of audiovisual content together with subtitle documents using a local media player: a) subtitle entries with perfect timing and b) shifted t seconds. Screenshots extracted from “Ridley Scott + IBM Watson: A Conversation”. Available at https://youtu.be/KDtxQRH8aI4. 175