1 [Extended version] Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks Lixin Fan 1 Kam Woh Ng 2 Chee Seng Chan 2 1 WeBank AI Lab, Shenzhen, China 2 Center of Image and Signal Processing, Faculty of Comp. Sci. and Info., Tech. University of Malaya, Kuala Lumpur, Malaysia {lixinfan@webank.com;kamwoh@siswa.um.edu.my;cs.chan@um.edu.my} Abstract—With substantial amount of time, resources and human (team) efforts invested to explore and develop successful deep neural networks (DNN), there emerges an urgent need to protect these inventions from being illegally copied, redis- tributed, or abused without respecting the intellectual properties of legitimate owners. Following recent progresses along this line, we investigate a number of watermark-based DNN ownership verification methods in the face of ambiguity attacks, which aim to cast doubts on the ownership verification by forging counterfeit watermarks. It is shown that ambiguity attacks pose serious threats to existing DNN watermarking methods. As remedies to the above-mentioned loophole, this paper proposes novel passport- based DNN ownership verification schemes which are both robust to network modifications and resilient to ambiguity attacks. The gist of embedding digital passports is to design and train DNN models in a way such that, the DNN inference performance of an original task will be significantly deteriorated due to forged passports. In other words, genuine passports are not only verified by looking for the predefined signatures, but also reasserted by the unyielding DNN model inference performances. Extensive experimental results justify the effectiveness of the proposed passport-based DNN ownership verification schemes. Code and models are available at https://github.com/kamwoh/DeepIPR I. I NTRODUCTION With the rapid development of deep neural networks (DNN), Machine Learning as a Service (MLaaS) has emerged as a viable and lucrative business model. However, building a successful DNN is not a trivial task, which usually requires substantial investments on expertise, time and resources. As a result of this, there is an urgent need to protect invented DNN models from being illegally copied, redistributed or abused (i.e. intellectual property infringement). Recently, for instance, digital watermarking techniques have been adopted to provide such a protection, by embedding watermarks into DNN models during the training stage. Subsequently, ownerships of these inventions are verified by the detection of the embedded watermarks, which are supposed to be robust to multiple types of modifications such as model fine-tuning, model pruning and watermark overwriting [1], [2], [3], [4]. In terms of deep learning methods to embed watermarks, existing approaches can be broadly categorized into two schools: a) the feature-based methods that embed designated watermarks into the DNN weights by imposing additional regularization terms [1], [3], [5]; and b) the trigger-set based methods that rely on adversarial training samples with specific labels (i.e. backdoor trigger sets) [2], [4]. Watermarks embedded with either of these methods have successfully demonstrated robustness against removal attacks which involve modifications of the DNN weights such as fine-tuning or pruning. However, our studies disclose the existence and effectiveness of ambiguity attacks which aim to cast doubt on the ownership verification by forging additional watermarks for DNN models in question (see Fig. 1). We also show that it is always possible to reverse- engineer forged watermarks at minor computational cost where the original training dataset is also not needed (Sect. II). As remedies to the above-mentioned loophole, this paper proposes a novel passport-based approach. There is a unique advantage of the proposed passports over traditional watermarks - i.e. the inference performance of a pre-trained DNN model will either remain intact given the presence of valid passports, or be significantly deteriorated due to either the modified or forged passports. In other words, we propose to modulate the inference performances of the DNN model depending on the presented passports, and by doing so, one can develop ownership verification schemes that are both robust to removal attacks and resilient to ambiguity attacks at once (Sect. III). The contributions of our work are threefold: i) we put forth a general formulation of DNN ownership verification schemes and, empirically, we show that existing DNN watermarking methods are vulnerable to ambiguity attacks; ii) we propose novel passport-based verification schemes and demonstrate with extensive experiment results that these schemes successfully defeat ambiguity attacks; iii) methodology-wise, the proposed modulation of DNN inference performance based on the presented passports (Eq. 5) plays an indispensable role in bringing the DNN model behaviours under control against adversarial attacks. A. Related work Uchida et. al [1] was probably the first work that proposed to embed watermarks into DNN models by imposing an additional regularization term on the weights parameters. [2], [6] proposed to embed watermarks in the classification labels arXiv:1909.07830v3 [cs.CR] 2 Nov 2019