Supplementary Material For PlanIt: A Crowdsourcing Approach for Learning to Plan Paths from Large Scale Preference Feedback Ashesh Jain, Debarghya Das, Jayesh K. Gupta and Ashutosh Saxena Fig. 1: An environment with three instances of watching activity. Fig. 2: Generative process for modeling the user preference data. I. GENERATIVE MODEL:LEARNING THE PARAMETERS Given user preference data from PlanIt, we learn the model parameters. Since our goal was to make the data collection easier for users, the labels we get are either bad, neutral or good for a particular segment of the video. The challenge is that we do not know which activity a is being affected by a given waypoint t i during feedback. A waypoint could even be inﬂuencing multiple activities. For example, in Fig. 1 a waypoint passing between the human and TV could affect multiple watching activities. We therefore deﬁne a latent random variable z i a ∈{0, 1} for waypoint t i , such that p(z i a |E) (or η a ) is the (prior) probability of user data arising from activity a. Incorporating this parameter gives the following cost function: Ψ({t 1 , .., t k }|E)= k  i=1  a∈A E p(z i a |E)Ψ a (t i |E)    Marginalizing latent variable z i a (1) where A E is the set of activities in environment E. 1 Figure 2 shows the generative process for preference data. Training data: We obtain user preferences over n environ- ments E 1 , .., E n . For each environment E we consider m A. Jain, D. Das, J. K. Gupta and A. Saxena are with the Department of Computer Science, Cornell University, USA. ashesh@cs.cornell.edu, dd367@cornell.edu, jkg76@cornell.edu asaxena@cs.cornell.edu 1 We extract the information about the environment and activities by querying OpenRAVE. In practice and in the robotic experiments, human activity information can be obtained using the software package by Koppula et al. [1]. trajectory segments T E,1 , .., T E,m labeled as bad by users. For each segment T we sample k waypoints {t T ,1 , .., t T ,k }. We use Θ ∈ R 30 to denote the model parameters and solve the following maximum likelihood problem: Θ * = arg max Θ n  i=1 m  j=1 Ψ(T Ei,j |E i ; Θ) = arg max Θ n  i=1 m  j=1 k  l=1  a∈A E i p(z l a |E i ; Θ) Ψ a (t T E i ,j ,l |E i ; Θ) (2) Eq. (2) does not have a closed form solution. We fol- low Expectation-Maximization (EM) procedure to learn the model parameters. In E-step, we calculate the posterior activity assignment p(z l a |t T E i ,j ,l ,E i ) for all the waypoints and update the parameters in the M-step. E-step: In this step keeping the model parameters ﬁxed we ﬁnd the posterior probability of a waypoint t affecting an activity a. p(z a |t, E; Θ) = p(z a |E; Θ)Ψ a (t|E; Θ) ∑ a∈A E p(z a |E; Θ)Ψ a (t|E; Θ) (3) We calculate this posterior for every waypoint t in our data. M-step: Using the posterior from E-step we update the model parameters in this step. Our affordance representation consists of three distributions, namely: Gaussian, von-Mises and Beta. The parameters of Gaussian, and mean (μ) of von-Mises are updated in a closed form. Following Sra [2] we perform ﬁrst order approximation to update the variance (κ) of von-Mises. The parameters of beta distribution (α and β) are approximated using ﬁrst and second order moments of the data. Estimating von-Mises distribution parameters: von- Mises is parameterized by a scalar mean μ and variance κ. Mean for an activitiy a has closed form update expression: μ a = ∑ n i=1 ∑ m j=1 ∑ k l=1 p(z l a |t T E i ,j ,l ,E i )x t T E i ,j ,l ‖ ∑ n i=1 ∑ m j=1 ∑ k l=1 p(z l a |t T E i ,j ,l ,E i )x t T E i ,j ,l ‖ (4) However, updating κ is not straightforward. We follow the ﬁrst order approximation by Sra [2] and update κ as follows: κ a = ¯ R(2 − ¯ R 2 ) 1 − ¯ R 2 (5) where, ¯ R = ‖ ∑ n i=1 ∑ m j=1 ∑ k l=1 p(z l a |t T E i ,j ,l ,E i )x t T E i ,j ,l ‖ ∑ n i=1 ∑ m j=1 ∑ k l=1 p(z l a |t T E i ,j ,l ,E i ) (6)