Reflectance Estimation for Free-viewpoint Video
George Ash
Dimension
United Kingdom
george@dimensionstudio.co
Juraj Tomori
Dimension
United Kingdom
juraj@dimensionstudio.co
Mike Pelton
Dimension
United Kingdom
mike@dimensionstudio.co
Charles Dupont
Dimension
United Kingdom
charles@dimensionstudio.co
Figure 1: Left: renderer output, ground truth image feed. Right: the output of our method: roughness, albedo, normal maps.
ABSTRACT
We present a method to infer physically-based material properties
for free-viewpoint video. Given a multi-camera image feed and
reconstructed geometry, our method infers material properties,
such as albedo, surface normal, metallic and roughness maps. We
use a physically based, diferentiable renderer to generate candidate
images which are compared against the image feed. Our method
searches for material textures which minimise an image-space loss
metric between candidate renders and the ground truth image feed.
Our method produces results that approximate state of the art
refectance capture, and produces texture maps that are compatible
with common real-time and ofine shading models.
ACM Reference Format:
George Ash, Juraj Tomori, Mike Pelton, and Charles Dupont. 2021. Re-
fectance Estimation for Free-viewpoint Video. In Special Interest Group
on Computer Graphics and Interactive Techniques Conference Posters (SIG-
GRAPH ’21 Posters), August 09-13, 2021. ACM, New York, NY, USA, 2 pages.
https://doi.org/10.1145/3450618.3469146
1 INTRODUCTION
Capturing photorealistic, high quality performances of humans for
use in flm and video games remains an active area of research in
computer graphics and computer vision. Whilst there have been
signifcant developments in refectance capture for free-viewpoint
video [Guo et al. 2019], less work has been done to augment captures
under a single lighting condition, which is most commonly in use
for free-viewpoint video captures today.
Previous work in free-viewpoint video [Collet et al. 2015] cap-
tures geometry and a difuse texture, but illumination is baked
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
SIGGRAPH ’21 Posters, August 09-13, 2021, Virtual Event, USA
© 2021 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-8371-4/21/08.
https://doi.org/10.1145/3450618.3469146
onto the surface of the capture, and the method struggles to re-
construct specular surfaces. Later work [Guo et al. 2019] improves
relightability by computing albedo, surface normal, shininess and
ambient occlusion maps, but requires a complex lighting setup pro-
ducing spherical gradients and careful synchronization between
lighting changes and the video feed. Our work assumes the same
106-camera set up as in [Collet et al. 2015], but improves on the
method, by separating the ’baked in’ difuse texture into several
physically-based maps, allowing for improved relightability and
reproduction of view-dependent refectance.
By framing the problem as an inverse-graphics optimisation
procedure: that is, minimising the image-space loss between can-
didate renders and a ground-truth image feed from the camera
array, we can optimise for the following material properties: albedo,
roughness, metalness and normals.
2 OUR METHOD
To allow our candidate renders to match the camera feed most
closely, we implemented a diferentiable physically-based shading
model in PyTorch, which uses nvdifrast [Laine et al. 2020] for fast,
diferentiable rendering abstractions.
2.1 Lighting
We implement a GGX microfacet BRDF, described in [Karis 2013].
We can then accurately light our base geometry under a prefltered
HDR environment map of the capture stage (seen in fgure 2), and
the material texture variables. We found the need to inhibit strong
specular efects in cloth areas, so we multiply the specular term by
a weighting factor, which we set up as a variable texture map in
the optimisation procedure.
2.2 Image Loss
To ensure our candidate renders and camera feed line up in image
space, we use the same callibration procedure described in [Collet
et al. 2015]. We can then combine the camera extrinsics, such as
world space position and rotation with the camera intrinsics, such
as focal length to produce a pinhole projection matrix. We also