Comparison of Semantic Segmentation Approaches for Horizon/Sky Line Detection Touqeer Ahmad * , Pavel Campr † , Martin ˇ Cad´ ık ‡ , George Bebis * † University of West Bohemia, Pilsen, Czech Republic campr@kky.zcu.cz * Department of Computer Science and Engineering, University of Nevada, Reno, USA tahmad@nevada.unr.edu bebis@cse.unr.edu ‡ Brno University of Technology, Faculty of Information Technology, Centre of Excellence IT4Innovations, Czech Republic cadik@fit.vutbr.cz Abstract—Horizon or skyline detection plays a vital role towards mountainous visual geo-localization, however most of the recently proposed visual geo-localization approaches rely on user-in-the-loop skyline detection methods. Detecting such a segmenting boundary fully autonomously would definitely be a step forward for these localization approaches. This paper provides a quantitative comparison of four such methods for autonomous horizon/sky line detection on an extensive data set. Specifically, we provide the comparison between four recently proposed segmentation methods; one explicitly targeting the problem of horizon detection[2], second focused on visual geo- localization but relying on accurate detection of skyline [15] and other two proposed for general semantic segmentation – Fully Convolutional Networks (FCN) [21] and SegNet[22]. Each of the first two methods is trained on a common training set [11] comprised of about 200 images while models for the third and fourth method are fine tuned for sky segmentation problem through transfer learning using the same data set. Each of the method is tested on an extensive test set (about 3K images) covering various challenging geographical, weather, illumination and seasonal conditions. We report average accuracy and average absolute pixel error for each of the presented formulation. I. I NTRODUCTION With the massive availability of geo-tagged imagery and increased computational power, geo-localization/geolocation has captured a lot of attention from researchers in computer vision and image retrieval communities. Significant progress has been made in urban environments with stable man-made structures and geo-referenced street imagery of frequently vis- ited tourist attractions [18], [19], [20]. Recently some attempts have been made towards geo-localization of natural/mountain scenes which is more challenging due to changed vegetations, lighting and seasonal changes and lack of geo-tagged imagery. Typical approaches for mountain/natural geo-localization rely on mountain peaks and valley information, visible skylines, ridges or combinations of all three [10], [11], [12], [13], [24], [14], [15]. Sky/horizon line has been established to be a robust natural feature for mountainous images which can be matched with the synthetic skylines generated from publicly available terrain maps – Digital Elevation Models (DEMs). Hence, the very first step in the geolocation pipeline for mountainous regions is to find the skyline in the given query image. How- ever, most of the solutions for mountainous geo-localization rely on user-in-the-loop methods for skyline extraction where a user is required to mark/correct portion of the sky/horizon line [11], [12], [14], [15]. In addition to visual geo-localization and mountain image annotation/tagging, sky/horizon line has proven to be useful for various other applications e.g. UAV navigation [23], [5], [17], [9], [6], [7], vehicle navigation[16], augmented reality [13] and port security [8]. It should be noted that most of the earlier horizon/sky line detection approaches assume horizon to be a linear boundary; Hough transform was generally employed to find the line parameters subject to some cost function [6], [5], [7], [8], [25]. Although linear horizon boundary could be of good use for UAV navigation, ship detection and/or port security; a non-linear sky segmentation is a must for geo-localization and hence the focus of this paper. A. Related Work – Mountainous Geo-Localization Using silhouette edge matching, Baboud et al. [10] estimate the pose of camera relative to geometric terrain model (DEM) assuming known viewpoint and FOV estimates. Effectively a rotation g ∈ SO(3) is searched which maps the camera frame to the terrain frame. They developed a robust silhouette matching metric to cope with inevitable noise affecting de- tected edges (compass edge detector is used). Since, a direct extensive search on SO(3) based on their devised metric is quite expensive, that is why they also proposed a pre- processing search space reduction step based on spherical cross-correlation of 2D edge orientation vectors. They reported that 86% of 28 images were correctly aligned belonging to two distinct mountain regions with matching error below 0.2 ◦ . Baatz et al. [11] proposed a visual geo-localization pipeline based on bag-of-curvelets; where shape information is aggregated across the whole skyline of a query image and a similar configuration of shapes is searched in a large scale database of panoramic skylines (extracted offline from DEMs). In addition to encoded contourlets, the viewing direction for each descriptor is also saved which is used for on-the-fly geo- metric verification in an inverted file search framework. Since, they are comparing 10 ◦ - 70 ◦ views with 360 ◦ panoramas, they redefine the weighted L1-norm to implement “contains”- arXiv:1805.08105v1 [cs.CV] 21 May 2018