A SPATIO-ANGULAR FILTER FOR HIGH QUALITY SPARSE LIGHT FIELD REFOCUSING Martin Alain, Aljosa Smolic V-SENSE Project, School of Computer Science and Statistics, Trinity College, Dublin ABSTRACT The ability to render synthetic depth-of-ﬁeld effects post capture is a ﬂagship application of light ﬁeld imaging. However, it is known that many existing light ﬁeld refocusing methods suffer from severe artefacts when applied to sparse light ﬁelds, known as angular aliasing. We propose in this paper a method for high quality sparse light ﬁeld refocusing based on insights from depth- based bokeh rendering techniques. We ﬁrst provide an in-depth analysis of the geometry of the defocus blur in light ﬁeld refocus- ing by analogy with the defocus geometry in a traditional camera using the thin lens model. Based on this analysis, we propose a ﬁlter for removing angular aliasing artefacts in light ﬁeld refo- cusing, which consists in modifying the well known shift-and- sum algorithm to apply a depth-dependent blur to the light ﬁeld in between the shift and the sum operations. We show that our method can achieve signiﬁcant quality improvements compared to existing approaches for a reasonable computational cost. Index Terms— Light ﬁeld imaging, refocusing, angular aliasing, bokeh 1. INTRODUCTION Light ﬁeld imaging allows to capture all light rays passing through a given amount of the 3D space [1, 2], especially captur- ing angular information which is lost in traditional 2D imaging systems. We focus in this paper on the common two-plane pa- rameterisation of light ﬁelds, in which the light ﬁeld can be rep- resented as a 4D function: Ω × Π → R, (s, t, u, v) → p(s, t, u, v), where the plane Ω represents the spatial distribution of light rays, also called the image plane, indexed by (u, v), while Π, the cam- era plane, corresponds to their angular distribution indexed by (s, t). In practice, the light ﬁeld parameterised with two-parallel planes consists in a regularly sampled 2D grid of 2D images. The regular grid spacing on the camera plane is called the baseline, denoted b, while the 2D images are named sub-aperture images (SAI). We consider in this paper the variables s, t, u, v to be met- ric, and deﬁne their corresponding scalar indices i, j, k, l, where i, j are camera indices and k, l are pixel indices. For convenience, we deﬁne L(i, j, k, l)  p(s, t, u, v) and we denote the SAIs by I i,j (k, l)  L(i, j, k, l). Applications of light ﬁelds notably include rendering novel images viewpoints [1, 3], estimating scene geometry in the form This publication has emanated from research conducted with the ﬁnan- cial support of Science Foundation Ireland (SFI) under the Grant Number 15/RP/2776. This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 780470. of disparity or depth maps [4–6], and synthetic depth-of-ﬁeld rendering or refocusing [7, 8]. In this paper, we focus on the latter application, for which many methods have been proposed. The shift-and-sum algorithm [7, 9] is a simple and a well known method to produce refocused images from a light ﬁelds, in which the light ﬁeld SAIs are ﬁrst shifted towards the target focal plane and then averaged. An extension of this concept to the Fourier domain was later proposed in [10]. More advanced ﬁlters in the 4D Fourier domain have then been proposed to perform volumet- ric refocusing [8]. More recently, the Fourier Disparity Layer representation has been proposed [11], which allows rendering and refocusing in real time by exploiting parallelisation capabil- ities of modern GPUs. However, the light ﬁeld refocusing methods cited above ex- hibit artefacts when applied to sparse light ﬁeld inputs, known as angular aliasing. The formal deﬁnition of densely (and by op- position sparsely) sampled light ﬁelds is given in the study of plenoptic sampling [12–14]. In [12], Chai et al. ﬁrst provided guidelines for dense light ﬁeld sampling. By considering the dis- parity between neighbouring SAIs, the condition for having a densely sampled light ﬁeld is that its disparity should not exceed 1. Such condition is difﬁcult to respect in practice, and many ex- isting light ﬁeld datasets are not strictly dense, in particular when captured with a gantry or a camera array. Therefore, multiple ap- proaches have been developed to address angular aliasing in light ﬁeld refocusing. A direct approach consists in reconstructing a dense light ﬁeld from the sparse input before refocusing [15]. In order to avoid having to reconstruct a full dense light ﬁeld or perform any pre-processing of the light ﬁeld, Xiao et al. [16] pro- posed a method to detect angular aliasing using a statistical anal- ysis of the refocused light ﬁeld, and reduce the aliasing by using lower resolution version of the refocused image from a Gaus- sian pyramid, which are then fused with Poisson image editing techniques [17]. Wang at al. proposed to use depth-based bokeh rendering methods (discussed below) in order to avoid angular aliasing artefacts, which is also combined with super-resolution of the in-focus region to render the ﬁnal image [18]. A learning based method was recently proposed in which the angular alias- ing ﬁltering is considered as a denoising problem solved with a convolutional neural network [19]. Before light ﬁeld refocusing, rendering synthetic bokeh has been a long standing application in computer graphics [20–22]. By analysing the geometry of the defocus blur in traditional cam- eras using the thin lens model, the radius of the circle of confu- sion (CoC), can be expressed depending on the aperture radius, the depth of the light point source, the depth of the focal plane, and the lens focal length. Given an all-in-focus input image, its corresponding depth map and the camera parameters, a synthetic