This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2019.2925084, IEEE Access Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identiﬁer 10.1109/ACCESS.2017.DOI 3D No-reference Image Quality Assessment via Transfer Learning and Saliency-guided Feature Consolidation XU XIAOGANG 1 , BUFAN SHI 2 , ZIJIN GU 3 , RUIZHE DENG 1 ,XIAODONG CHEN 4 , AND ANDREY S. KRYLOV 5 , AND YONG DING 1 1 College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China 2 School of Automobile Studies, Tongji University, Shanghai 201804, China 3 College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China 4 The 14 th Research Institute of China Electronics Technology Group Corporation, Nanjing 210039, China 5 Laboratory of Mathematical Methods of Image Processing, Lomonosov Moscow State University, Moscow 119991, Russia Corresponding author: Yong Ding (dingy@vlsi.zju.edu.cn). This work was supported by the National Science and Technology Major Project under Grant 2016ZX01012101-003 and the Fundamental Research Funds for the Central Universities. ABSTRACT Motivated by the success of Convolutional Neural Networks (CNNs) in image-related applications, in this paper, we design an effective method for no-reference 3D image quality assessment (3D IQA) through CNN-based feature extraction and consolidation strategy. In the ﬁrst and most vital stage, quality-aware features, which reﬂect the inherent quality of images, are extracted by a ﬁne-tuned CNN model exploiting the concept of transfer learning. This ﬁne-tuning strategy solves the large-scale training data dependence existing in current deep-learning-based IQA algorithms. In the second stage, features from left and right view are consolidated by linear weighted fusion where the weight for each image is obtained from its saliency map. In addition, statistical characteristics of the disparity map are also considered in a multi-scale manner as additional features. In the ﬁnal stage of quality mapping, the objective score for each stereoscopic pair is gained by Support Vector Regression. Experimental results on the public databases show that our approach outperforms many existing no-reference and even full-reference methods. INDEX TERMS No-reference 3D image quality assessment, Deep neural network, Transfer learning I. INTRODUCTION Image quality assessment (IQA) plays an important role in guiding the optimization during image processing and com- munication [1]–[3]. Since humans are the ultimate receivers of images, subjective evaluation is regarded as the most reliable way to predict the perceptual quality. However, its cumbersome characteristics result in its impracticability to be applied in real world [2]. Therefore, robust objective IQA methods that evaluate the image quality automatically and accurately are urgently demanded. A reliable objective IQA method ensures that the predicted score is consistent with the subjective value which is called the mean opinion score (MOS) [3]. With the development of 3D imaging technologies, 3D IQA has received a great deal of research attention. However, compared with signiﬁcant progress made in 2D IQA, the study on 3D IQA still remains quite immature. Generally, 3D IQA methods can be classiﬁed into three categories: full-reference (FR) [4], [5], reduced-reference (RR) [6], and no-reference (NR) [1], [2], [7], depending on whether the reference (images with no distortion) is available. In the FR scheme, full information of the reference is available; in the RR methods, only partial information of the reference is accessible. However, the scope of FR or RR methods is limited since the reference signal or even partial signal is not available in many cases of the real-world. By contrast, NR methods predict image quality without any reference, thus they are more appealing [8]. In this paper, considering practical factors, we focus on developing an effective and robust 3D NR-IQA framework. There are several challenges in the task of 3D NR-IQA. Among them, an essential step is to capture effective features which can reﬂect the levels of distortion. Traditionally, the strategies of extracting such quality-aware features basically fall into two categories: utilizing Natural Scene Statistic (NSS) [3], [7] or Human Visual System (HVS) models [1], VOLUME 4, 2016 1