CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition A. A. M. Muzahid, Wanggen Wan, Senior Member, IEEE, Ferdous Sohel, Senior Member, IEEE, Lianyao Wu, and Li Hou Abstract—In computer vision fields, 3D object recognition is one of the most important tasks for many real-world applications. Three-dimensional convolutional neural networks (CNNs) have demonstrated their advantages in 3D object recognition. In this paper, we propose to use the principal curvature directions of 3D objects (using a CAD model) to represent the geometric features as inputs for the 3D CNN. Our framework, namely CurveNet, learns perceptually relevant salient features and predicts object class labels. Curvature directions incorporate complex surface information of a 3D object, which helps our framework to produce more precise and discriminative features for object recognition. Multitask learning is inspired by sharing features between two related tasks, where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification. Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification. We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs. A Cross-Stitch module was adopted to learn effective shared features across multiple representations. We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task. Index Terms—3D shape analysis, convolutional neural network, DNNs, object classification, volumetric CNN. I. Introduction I N the field of computer vision, 2D image analysis using deep learning (DL) methods has already achieved remarkable progress and outperforms human vision in many cases (e.g., image classification and human face analysis) [1], [2]. However, understanding three-dimensional objects is still an open research problem of modern computer vision research. A real object in three-dimensional space provides more detailed information. With the availability of low-cost 3D acquisition devices, it is easier to capture 3D objects. The rise of public repositories of 3D models has drawn attention to computer vision research such as 3D object recognition, reconstruction, semantic segmentation, and retrieval [3]. Convolutional neural network (CNN)-based 3D object recognition systems have advanced considerably [4], but 3D CNNs have been not as successful in identifying 3D objects as 2D CNNs, especially in object label prediction. There are several reasons behind this, e.g., selecting the input features of a 3D object [5] is critical due to its complex geometrical structures, comparatively smaller training databases, and the high computational cost required by 3D CNNs. The earliest volumetric deep learning approach is 3D ShapeNets [6], which deals with three tasks including 3D object classification. Recently, several approaches have been published that solve 3D object recognition tasks using deep CNNs; voxels [6]–[9], point clouds [10]–[12], and 2D multiview [13]–[15] are the most widely used representations of CNNs for 3D object recognition [4]. Two-dimensional multiview-based approaches that use 2D CNNs achieve high performance. This is because existing 2D-based CNNs can be directly used for 3D object recognition, and they require fewer computational resources. However, multiview representations have some technical issues; for example, choosing the number of views to capture the information of the entire 3D object is still an open issue. In addition, the projection of 3D data to the 2D domain discards intrinsic features (e.g., geometric, structural, and orientational information) of a 3D object. In 3D shape analysis, the three-dimensional representation is the only way to preserve the intrinsic features of a 3D object; therefore, new features of a 3D object and advanced neural networks need to be explored to improve 3D vision tasks [8]. AI-based computer vision systems are developed with advanced machine learning (ML) algorithms (e.g., deep learning). In addition, the object classification accuracy of a CNN is also highly influenced by the input features [4]. In this study, we incorporated curvature directions as input features of a 3D object into our novel 3D CNN to identify object labels. Principal curvature directions of 3D objects are considered as perceptually relevant properties of the human visual system (HVS) that are widely used in 3D mesh visual quality evaluation [16], [17]. Considering perceptual features in the HVS, curvature maps represent the salient structural Manuscript received May 21, 2020; accepted June 24, 2020. This paper was partially supported by a project of the Shanghai Science and Technology Committee (18510760300), Anhui Natural Science Foundation (1908085MF178), and Anhui Excellent Young Talents Support Program Project (gxyqZD2019069). Recommended by Associate Editor Shangce Gao. (Corresponding author: A. A. M. Muzahid.) Citation: A. A. M. Muzahid, W. G. Wan, F. Sohel, L. Y. Wu, and L. Hou, “CurveNet: Curvature-based multitask learning deep networks for 3D object recognition,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177–1187, Jun. 2021. A. A. M. Muzahid, W. G. Wan, and L. Y. Wu are with the School of Communications and Information Engineering, Institute of Smart City, Shanghai University, Shanghai 200444, China (e-mail: muzahid@shu.edu.cn; wanwg@staff.shu.edu.cn; pisuto@shu.edu.cn). F. Sohel is with the Discipline of Information Technology, Murdoch University, Murdoch WA 6150, Australia (e-mail: f.sohel@murdoch.edu.au). L. Hou is with the School of Information Engineering, Huangshan University, Huangshan 245041, China (e-mail: houli_1981@hsu.edu.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JAS.2020.1003324 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 8, NO. 6, JUNE 2021 1177