Direct 3-D Shape Recovery from Image Sequence Based on Multi-scale Bayesian Network Norio Tagawa Junya Kawaguchi Shoichi Naganuma Kan Okubo Tokyo Metropolitan University, 6-6 Asahigaoka, Hino-shi, Tokyo 191-0065, Japan tagawa@sd.tmu.ac.jp Abstract We propose a new method for recovering a 3-D ob- ject shape from an image sequence. In order to recover high-resolution relative depth without using the com- plex Markov random ﬁeld (MRF) that includes a line process, we construct a recovery algorithm based on a belief propagation scheme using a multi-scale Bayesian network. With this algorithm, relative 3-D motion be- tween a camera and an object can be determined to- gether with relative depth, and the maximum a posteri- ori expectation-maximization (MAP-EM) algorithm is effectively used to determine a suitable approximation. 1. Introduction We propose a method for obtaining 3-D depth in- formation using a gradient based scheme with two suc- cessive images. In this ﬁeld of study, spatially dense and stable detection is strongly required [1]–[3], and the aperture problem and the alias problem need to be completely solved [4]. Usually, either local optimiza- tion or global optimization is used to avoid the aperture problem. To avoid the alias problem, components of low spatial frequency are extracted by low-pass ﬁlter- ing and used to compute optical ﬂow. However, these techniques lower the resolution of the obtained optical ﬂow and hence, relative depth. In this study, we attempt to directly recover 3-D depth information without explicitly detecting optical ﬂow, and apply Bayesian network spreading to a res- olution direction by decomposing the original image into multi-scale images. Unknown parameters are rep- resented as a node as well as depth to be estimated and observed image information. We call this graphi- cal model a multi-scale Bayesian network. If the pa- rameters, including relative 3-D motion parameters, are determined in advance, the inference of depth in this network is realized by Kalman ﬁltering. Especially, for optical ﬂow detection, Simoncelli [4] introduced the multi-scale Bayesian network, which considers optical ﬂow as a node with parameters assumed to be known, and proposed the Kalman ﬁlter-based algorithm. In our study, we attempt to estimate the depth and parameters simultaneously from observations. The parameters to be estimated are common to all multi-scale images, and hence, we have to adopt a suit- able approximation to simplify the inference. In most tractable approximations, the parameters are considered to be independent between multi-scale images. How- ever, the information for the parameters obtained in a low-resolution image is not directly propagated; that is, it is implicitly propagated through the propagation of depth information. We propose a stable procedure using the maximum a posteriori expectation-maximization (MAP-EM) algorithm, which can directly propagate the parameters’ information. 2. Gradient method for recovering depth 2.1. Projection model and optical ﬂow We use perspective projection as our camera- imaging model. The camera is ﬁxed with an (X,Y,Z ) coordinate system, where the viewpoint (lens center) is at origin O and the optical axis is along the Z -axis. The projection plane (image plane) Z =1 can be used with- out any loss of generality, which means that the focal length equals 1. A space point (X,Y,Z ) on the ob- ject is projected to image point (x, y). At each (x, y), the optical ﬂow [v x ,v y ] ⊤ is formulated with an inverse depth d(x, y) ≡ 1/Z (x, y) and the camera’s transla- tional and rotational vectors u =[u x ,u y ,u z ] ⊤ , and r =[r x ,r y ,r z ] ⊤ , respectively, as follows: v x = xyr x − (1 + x 2 )r y + yr z − (u x − xu z )d, (1) v y = (1 + y 2 )r x − xyr y − xr z − (u y − yu z )d. (2) 978-1-4244-2175-6/08/$25.00 ©2008 IEEE