Optimal navigation for vehicles with stochastic dynamics Shridhar K. Shah, Herbert G. Tanner and Chetan D. Pahlajani Abstract—This paper presents a framework for input-optimal naviga- tion under state constraints for vehicles exhibiting stochastic behavior. The resulting stochastic control law is implementable in real-time on vehicles with limited computational power. When control actuation is unconstrained, then convergence with probability one can be theoretically guaranteed. When inputs are bounded, the probability of convergence is quantifiable. Experimental implementation on a 5.5 g, 720 MHz processor that controls a bio-inspired crawling robot with stochastic dynamics, corroborates the design framework. Keywords — robot navigation; stochastic control; path integrals I. I NTRODUCTION Miniature robots can exhibit stochastic behavior for many reasons: environmental perturbations, unmodeled compliance, ground interac- tions, battery charge fluctuations. Deterministic feedback navigation strategies cannot offer guarantees of convergence or obstacle avoid- ance when applied to systems with stochastic dynamics [3]. At the miniature scale in particular, and in conjunction with limited on-board power storage and computation capabilities, uncertainty over the system’s position can be significant enough to prevent the completion of the assigned mission. Not only should uncertainty be accounted for during control design, but given that at this scale power density is limited, actuation effort must be applied sparingly. Within an optimal control framework, uncertainty can be ac- counted for by either deriving worst case bounds [4]–[6], or by employing stochastic models. Comparatively, the latter can provide more flexibility and are not as conservative. One can, for example, adjust the probability that problem constraints are violated, allowing solutions that are inadmissible otherwise. However, existing methods for constrained stochastic optimal control are too computationally demanding for real-time implementation [7]–[9]. For deterministic systems, motion planning is now mature [10], and existing methods can quickly produce in an open-loop fashion waypoint sequences that connect start to goal. Systems with stochastic dynamics, however, may not be able to realize these open loop plans. When dynamics are discrete-time linear, extensions of the classical rapidly-exploring random tree (RRT) that consider probabilistic un- certainty and constraints (chance-constraints) during planning can be applied [11], [12]. The computational complexity of such chance- constrained planners [11]–[13], however, is still prohibitive for plat- forms on the low-end of the processor frequency range. Work supported by ARL MAST CTA # W911NF-08-2-0004 Shah is with Mathworks Inc. shridhar.shah@mathworks.com. Tanner is the with Department of Mechanical Engineering, University of Delaware, Newark, DE, USA btanner@udel.edu Pahlajani is with Department of Mathematics, IIT Gandhinagar, India cdpahlajani@iitgn.ac.in Portion of this work have been previously presented at ICRA 2012 [1] and at SPIE 2013 [2]. The former conference paper dealt with systems without control multiplicative term and unbounded inputs in a linear setting. The latter conference paper presented experimental results that corroborated the theoretical predictions of the former. This paper deals with systems with control multiplicative term, considers the case of bounded inputs, and extends the theoretical study to address issues of well-posedness, existence of solutions, offer proofs of convergence, and include comparative studies. To rein in computational complexity, receding horizon control schemes have been used on stochastic linear discrete-time sys- tems [13], [14]. For linearizable nonlinear discrete-time systems, an iterative LQG [15] method offers a solution when the cost is quadratic. Particle filter approximations are used in conjunction with chance-constrained model predictive control for linear systems with probabilistic noise [13]. Other methods combine a hybrid density filter with dynamic programming [7]. Similar problems have been approached from a hybrid systems perspective in the context of a reach-avoid formulation [16], offering solutions capable of scaling up to three dimensions. In continuous-time, stochastic optimal control formulations do not fare much better. Numerical approximation meth- ods are applied for the calculation of path integrals [17], and different applications of the formulation have been explored in conjunction with reinforcement learning [18], and risk sensitive [9] control. This paper suggests the sequential application of pre-computed and real-time implementable locally optimal feedback strategies, that enable the system to evolve stochastically from one waypoint to the next all the way to its final destination. The approach is inspired by input-optimal exit-time stochastic optimal control formulations [19], [20]. This paper extends the aforementioned formulation to capture larger classes of systems, and adapts it to a waypoint navigation problem. It solves this problem by closing the loop through feedback controllers, shows that the resulting control system is a well defined stochastic hybrid system, and formally establishes its convergence properties. In addition to these theoretical contributions, the paper also shows how the control laws can be computed in computationally efficient ways that allow real-time application on low-speed processors, and investigates the effect of input saturation. Compared to related stochastic optimal control frameworks, the one reported here applies directly to nonlinear systems (cf. [13]), without propagating probability densities (cf. [7]). It is developed along a similar philosophy as those in other path integral methods [9], but being built within an exit-time optimal formulation, it offers control law expressions that are not dependent on some given final time. This exit-time formulation allows solutions to be obtained at orders of magnitude faster rates (e.g., 18 vs. 5000 seconds of CPU time) compared to alternative path integral solutions [9]. After some off-line pre-computation, these solutions can be implemented in real- time on systems with up to six states. In Section II we define a waypoint following problem, which can be mathematically encoded as an exit-time stochastic optimal control problem. Section III demonstrates that the optimal control problem of transitioning from waypoint to waypoint is associated with a partial differential equation (PDE), which can be solved numerically by leveraging the Feynman-Kac formula. In Section IV, we show that the resulting closed loop system is essentially a special case of a Markov string with well-defined solutions, and we prove that its convergence can be guaranteed. Section V assesses the performance of the closed loop system in terms of computational complexity and optimality, and demonstrates the scheme’s applicability to miniature vehicles driven by low-end processors. II. PROBLEM STATEMENT Consider the motion of vehicles with dynamics that can be repre- sented as a stochastic process: dq = b(q) dt + G(q) u(i, q) dt + Σ(q) dW (1) q(0) = q0 , where q ∈ R n is the state, b : R n → R n is the drift term, G : R n → R n×m is the matrix of control vector fields, i ∈I {0, 1,...,N }