Multi-Stage Point Completion Network with Critical Set Supervision Wenxiao Zhang a , Chengjiang Long b , Qingan Yan c , Alix L.H. d , Chunxia Xiao a,* a School of Computer Science, Wuhan University b Kitware Inc., Clifton Park, NY, USA c JD.com American Technologies Corporation, CA d Xiaomi Abstract Point cloud based shape completion has great significant application values and refers to reconstructing a complete point cloud from a partial input. In this paper, we propose a multi-stage point completion network (MSPCN) with critical set supervision. In our network, a cascade of upsampling units is used to progressively recover the high- resolution results with several stages. Different from the existing works that generate the output point cloud structure supervised by the complete ground truth, we leverage the critical set at each stage for supervision and generate a more informative and useful intermediate outputs for the next stage. We propose a strategy by combining max-pooling selected points and volume-downsampling points to determine critical sets (MVCS) for supervision, which concerns both a critical features and the shape of the model. We conduct extensive experiments on the ShapeNet dataset and the experimental results clearly demonstrate that our proposed MSPCN with critical set supervision outperforms the state-of-the-art completion methods. Keywords: Shape completion, Point cloud, Deep learning 1. Introduction An increasingly large volume of 3D data is becoming largely available due to the rapid growth of the 3D scan technology with low-cost sensors like depth camera or LIDAR. However, the acquired scan data is often incomplete due to occlusion and sensor resolution. It is desired to recover a complete shape even from a partial input, which refers to the task of shape completion and has significant values in multiple fields like 3D reconstruction Dai et al. (2017); Liao et al. (2019); Fu et al. (2018); Yan et al. (2017, 2016), robotics Varley et al. (2017), scene understanding Dai et al. (2018) and autonomous driving Yang et al. (2019). Most existing deep learning methods for shape completion just discretize the 3D data into voxel such as occupied grids or Truncated Signed Distance Function (TSDF) volume where convolution operations can be applied directly. However, the output of these methods is always in low-resolution due to the memory cost of volumetric represen- tation and discards some object details. As a raw representation of 3D objects, point cloud is able to overcome the shortcoming of volumetric representation. Recent, Yuan et al. Yuan et al. (2018) proposed the first point completion network using an encoder-decoder network in a coarse-to-fine fashion. Taking point cloud as input, this two-stage network generates a coarse output at the 1st stage and then produces the final result based on the coarse output at the 2nd stage. This motivates us to further explore the idea of the multi-stage refinement in point completion networks, as illustrated in Figure 1. In this paper, we propose a multi-stage point completion network (MSPCN), as shown in Figure 2. We argue that it is intuitive to progressively recover the complete object shape with multiple stages, where the network firstly generates a low-resolution result and then infers a higher-resolution shape based on the lower-resolution result at the previous stage. * Corresponding author Email addresses: wenxxiao.zhang@gmail.com (Wenxiao Zhang), cjfykx@gmail.com (Chengjiang Long), qingan.yan@jd.com (Qingan Yan), zhouliheng@xiaomi.com (Alix L.H.), cxxiao@whu.edu.cn (Chunxia Xiao) Preprint submitted to CAGD; Special Issue of GMP 2020 March 30, 2020