Reflectance Estimation for Free-viewpoint Video George Ash Dimension United Kingdom george@dimensionstudio.co Juraj Tomori Dimension United Kingdom juraj@dimensionstudio.co Mike Pelton Dimension United Kingdom mike@dimensionstudio.co Charles Dupont Dimension United Kingdom charles@dimensionstudio.co Figure 1: Left: renderer output, ground truth image feed. Right: the output of our method: roughness, albedo, normal maps. ABSTRACT We present a method to infer physically-based material properties for free-viewpoint video. Given a multi-camera image feed and reconstructed geometry, our method infers material properties, such as albedo, surface normal, metallic and roughness maps. We use a physically based, diferentiable renderer to generate candidate images which are compared against the image feed. Our method searches for material textures which minimise an image-space loss metric between candidate renders and the ground truth image feed. Our method produces results that approximate state of the art refectance capture, and produces texture maps that are compatible with common real-time and ofine shading models. ACM Reference Format: George Ash, Juraj Tomori, Mike Pelton, and Charles Dupont. 2021. Re- fectance Estimation for Free-viewpoint Video. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Posters (SIG- GRAPH ’21 Posters), August 09-13, 2021. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3450618.3469146 1 INTRODUCTION Capturing photorealistic, high quality performances of humans for use in flm and video games remains an active area of research in computer graphics and computer vision. Whilst there have been signifcant developments in refectance capture for free-viewpoint video [Guo et al. 2019], less work has been done to augment captures under a single lighting condition, which is most commonly in use for free-viewpoint video captures today. Previous work in free-viewpoint video [Collet et al. 2015] cap- tures geometry and a difuse texture, but illumination is baked Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). SIGGRAPH ’21 Posters, August 09-13, 2021, Virtual Event, USA © 2021 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-8371-4/21/08. https://doi.org/10.1145/3450618.3469146 onto the surface of the capture, and the method struggles to re- construct specular surfaces. Later work [Guo et al. 2019] improves relightability by computing albedo, surface normal, shininess and ambient occlusion maps, but requires a complex lighting setup pro- ducing spherical gradients and careful synchronization between lighting changes and the video feed. Our work assumes the same 106-camera set up as in [Collet et al. 2015], but improves on the method, by separating the ’baked in’ difuse texture into several physically-based maps, allowing for improved relightability and reproduction of view-dependent refectance. By framing the problem as an inverse-graphics optimisation procedure: that is, minimising the image-space loss between can- didate renders and a ground-truth image feed from the camera array, we can optimise for the following material properties: albedo, roughness, metalness and normals. 2 OUR METHOD To allow our candidate renders to match the camera feed most closely, we implemented a diferentiable physically-based shading model in PyTorch, which uses nvdifrast [Laine et al. 2020] for fast, diferentiable rendering abstractions. 2.1 Lighting We implement a GGX microfacet BRDF, described in [Karis 2013]. We can then accurately light our base geometry under a prefltered HDR environment map of the capture stage (seen in fgure 2), and the material texture variables. We found the need to inhibit strong specular efects in cloth areas, so we multiply the specular term by a weighting factor, which we set up as a variable texture map in the optimisation procedure. 2.2 Image Loss To ensure our candidate renders and camera feed line up in image space, we use the same callibration procedure described in [Collet et al. 2015]. We can then combine the camera extrinsics, such as world space position and rotation with the camera intrinsics, such as focal length to produce a pinhole projection matrix. We also