A Bayesian Approach to Adaptive Video Super Resolution Ce Liu Microsoft Research New England Deqing Sun Brown University (a) Input low-res (b) Bicubic up-sampling ×4 (c) Output from our system (d) Original frame Figure 1. Our video super resolution system is able to recover image details after ×4 up-sampling. Abstract Although multi-frame super resolution has been exten- sively studied in past decades, super resolving real-world video sequences still remains challenging. In existing sys- tems, either the motion models are oversimpliﬁed, or impor- tant factors such as blur kernel and noise level are assumed to be known. Such models cannot deal with the scene and imaging conditions that vary from one sequence to another. In this paper, we propose a Bayesian approach to adaptive video super resolution via simultaneously estimating under- lying motion, blur kernel and noise level while reconstruct- ing the original high-res frames. As a result, our system not only produces very promising super resolution results that outperform the state of the art, but also adapts to a variety of noise levels and blur kernels. Theoretical analysis of the relationship between blur kernel, noise level and frequency- wise reconstruction rate is also provided, consistent with our experimental results. 1. Introduction Multi-frame super resolution, namely estimating the high-res frames from a low-res sequence, is one of the fun- damental problems in computer vision and has been exten- sively studied for decades. The problem becomes particu- larly interesting as high-deﬁnition devices such as HDTV’s dominate the market. There is a great need for converting low-res, low-quality videos into high-res, noise-free videos that can be pleasantly viewed on HDTV’s. Although a lot of progress has been made in the past 30 years, super resolving real-world video sequences still re- mains an open problem. Most of the previous work assumes that the underlying motion has a simple parametric form, and/or that the blur kernel and noise levels are known. But in reality, the motion of objects and cameras can be arbi- trary, the video may be contaminated with noise of unknown level, and motion blur and point spread functions can lead to an unknown blur kernel. Therefore, a practical super resolution system should si- multaneously estimate optical ﬂow [9], noise level [18] and blur kernel [12] in addition to reconstructing the high-res frames. As each of these problems has been well studied in computer vision, it is natural to combine all these compo- nents in a single framework without making oversimpliﬁed assumptions. In this paper, we propose a Bayesian framework for adaptive video super resolution that incorporates high-res image reconstruction, optical ﬂow, noise level and blur ker- nel estimation. Using a sparsity prior for the high-res image, ﬂow ﬁelds and blur kernel, we show that super resolution computation is reduced to each component problem when other factors are known, and the MAP inference iterates between optical ﬂow, noise estimation, blur estimation and image reconstruction. As shown in Figure 1 and later exam- ples, our system produces promising results on challenging real-world sequences despite various noise levels and blur kernels, accurately reconstructing both major structures and ﬁne texture details. In-depth experiments demonstrate that our system outperforms the state-of-the-art super resolution systems [1, 23, 25] on challenging real-world sequences. We are also interested in theoretical aspects of super res- olution, namely to what extent the original high-res infor- mation can be recovered under a given condition. Although previous work [3, 15] on the limits of super resolution pro- vides important insights into the increasing difﬁculty of re- covering the signal as a function of the up-sampling factor, most of the bounds are obtained for the entire signal with frequency perspective ignored. Intuitively, high frequency components of the original image are much harder to re- cover as the blur kernel, noise level and/or up-sampling fac- tor increases. We use Wiener ﬁltering theory to analyze the 209