Self-supervised Exposure Trajectory Recovery for Dynamic Blur Estimation Youjian Zhang † , Chaoyue Wang † , Stephen J. Maybank ‡ , Dacheng Tao † † UBTECH Sydney AI Centre, Faculty of Engineering, The University of Sydney ‡ Department of Computer Science and Information Systems, Birkbeck College Abstract Dynamic scene blurring is an important yet challenging topic. Camera sensors record both latent sharp content and complex motions (e.g., camera shake and object motion and deformation) during exposure of a dynamic scene. A dynamic blurry image represents an accumulated exposure result over a period of time. Benefiting from their powerful fitting capacity, deep learning-based methods have achieved impressive performance for dynamic scene deblurring. However, the time-dependent motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of blurry motion is difficult to obtain and represent; (ii) the temporal ordering of blurry motion is destroyed during the accumulation process; and (iii) similar to blur removal, dynamic motion estimation is highly ill-posed. By revisiting the principle of camera exposure, dynamic blur can be described by the relative motions of sharp content with respect to each exposed pixel. This understanding motivates us to define exposure trajectories, which record the trajectories of relative motions during an exposure period to represent the motion information contained in a blurry image and explain the causes of dynamic blur. We propose a new blur representation, which we call motion offset, to model pixel-wise displacements of the latent sharp image at multiple timepoints. Under mild assumptions/constraints, the learned motion offsets can recover dense, (non-)linear exposure trajectories, which significantly reduce temporal disorder and ill-posed problems. Finally, we demonstrate that the estimated exposure trajectories can fit real-world dynamic blurs and further contribute to motion-aware image deblurring and warping-based video extraction from a single blurry image. Comprehensive experiments on benchmarks and challenging real-world cases demonstrate the superiority of the proposed framework over state-of-the-art methods. More video results can be found in our supplementary video. 1 Introduction Dynamic scene blurring caused by camera shake, object motion, or depth variation is one of the commonest image degradations. Estimating motion information and restoring sharp contents in dynamic blurry images would benefit many real-world applications including segmentation, detection, and recognition. Benefiting from the powerful fitting ability of deep convolutional neural networks (CNNs), deep learning-based deblurring methods [23, 9, 51, 30] have achieved impressive performance for dynamic blur removal. Nevertheless, exploring dynamic information in blurry images remains an academic and commercial challenge. Most conventional blur removal methods are based on blur kernel estimation [8, 15, 42, 21, 13, 14, 29], which assumes that a blurry area can be represented as a weighted sum of its latent sharp surrounding content. A blur kernel is actually a weighted matrix that performs convolution on a sharp image patch to synthesize a blurry pixel. Conversely, blur kernel estimation is cast as an energy minimization problem which aims to recover both the blur kernels and the latent sharp image from a blurry image. Such optimizations are highly ill-posed, so most conventional methods are restricted by assumptions of motion types and predefined image priors. For example, [42, 12, 46, 54] only handle blur caused by camera rotations, in-plane translations, or forward out-of-plane translations. For more complex dynamic blur, identifying a suitably informative and general prior is extremely difficult. 1 arXiv:2010.02484v1 [cs.CV] 6 Oct 2020