Constrained Regression for 3D Pose Estimation Aydin Varol CVLab, EPFL Mathieu Salzmann TTI Chicago Pascal Fua CVLab, EPFL Raquel Urtasun TTI Chicago 1 Introduction Estimating the 3D pose of a non-rigid, or articulated body from monocular images is one of the great challenges in computer vision. Recent advances have shown that exploiting available data via statistical learning techniques could help disambiguating the problem. Existing methods typically fall into two categories: those that learn a prior over possible poses and employ it in a generative approach [4, 6], and those that rely on a discriminative predictor to learn a mapping from image observations to 3D pose [5, 8]. Unfortunately, both approaches have their shortcomings. Gen- erative methods require accurate initialization, and thus in practice tend to yield suboptimal solutions. The employed discriminative methods typically assume that the output dimensions are independent given the inputs and are therefore only adapted to cases where the outputs are weakly correlated. Furthermore, to be accurate, discriminative approaches require a sufﬁcient amount of training input-output pairs, which can be hard to obtain in the context of non-rigid pose estimation. In short, while discriminative methods could be used to initialize generative ones, their results often are not accurate enough, due to the lack of data and violation of constraints between the output dimensions. In this paper, we propose to learn a discriminative regressor whose prediction is encouraged to satisfy constraints between the output dimensions. While existing structured output techniques attempt to learn these constraints from data [7, 2, 1], we propose to exploit constraints known a priori. This gives us the ability to make use of unsupervised data, where only the inputs are available. This is of great interest for 3D pose estimation, since obtaining training 3D poses is much harder than obtaining training images. As depicted by our results, the learned discriminative predictor not only better satisﬁes constraints on the unsupervised training data, but also improves prediction and constraint satisfaction on the test examples. 2 Constrained Discriminative Regression Our goal is to train a discriminative regressor whose prediction satisﬁes a given set of constraints for any test input. To this end, we express learning as the minimization of a regularized loss function subject to constraints on the parameters of the predictor. More speciﬁcally, let L be the set of labeled training examples containing N pairs of input x i ∈ R m and associated continuous multi-dimensional labels y i ∈ R D . Let U be the set of unlabeled training examples containing V inputs x j . The loss that we seek to minimize is then deﬁned on the set L, and the constraints on the set U . This lets us write learning as the following optimization problem minimize W loss(Wφ(x i ), y i )+ λ‖W‖ 2 F ∀x i , y i ∈L subject to C (Wφ(x j )) = 0 ∀x j ∈U , (1) where W ∈ R D×d is the parameter matrix that deﬁnes the prediction, φ(x): R m → R d is the feature map of the input x, C (W,φ(x)) encodes the constraints deﬁned with respect to the prediction Wφ(x), and the λ is the regularization weight. In our experiments, we consider the case of the square loss and enforce quadratic constraints on the predictions. The optimization problem then becomes minimize W N  i=1 ‖Wφ(x i ) − y i ‖ 2 + λ‖W‖ 2 F ∀x i , y i ∈L subject to (Wφ(x j )) T F k (Wφ(x j )) = l 2 k ∀x j ∈U , 1 ≤ k ≤ K, (2) where K is the number of quadratic constraints imposed for each of the unlabeled examples, and each constraint is speciﬁed by a matrix F k ∈ R D×D and a constant l 2 k . To solve this optimization problem, we follow an approach similar to [3] and iteratively linearize the constraints using a ﬁrst-order Taylor expansion. We then optimize for the weights W that minimize the regularized loss function while remaining in the nullspace of the linearized constraints. Note that, with the square loss, this second step can be done in closed-form. 1