Fixed Size Encoding Scheme for Software Watermarking Azyan Yusra Kapi Department of Computer Science Universiti Teknologi MARA Negeri Sembilan, Malaysia azyanyusra@ns.uitm.edu.my Subariah Ibrahim Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia Johor, Malaysia subariah@utm.my Abstract Software piracy has become a major threat to software developer. One of the techniques that can be used to watermark into software which can later be extracted to prove ownership. During the last few years, different algorithms were developed to embed watermark inside the software. One of the algorithms is dummy method insertion technique whereby a dummy method that embeds watermark is inserted in software application. However, the disadvantage of this algorithm is that the watermark is embedded in a particular instruction in the dummy method. Due to that the length of the instruction is dependent on the length of the watermark. Thus, this situation put dummy method in suspicions and become noticeable to the pirates. In this paper, we present an encoding scheme that could produce fixed size encoded watermark and thus making the encoded watermark in the dummy method less noticeable. The proposed encoding scheme uses a hash function so that encoded watermark is always fixed even though the size of watermark character varies. As a result, our encoding scheme produced fixed size dummy method and could make the dummy method less noticeable to the pirates. Keywords-software watermarking; encoding scheme; hash function; software piracy I. INTRODUCTION daily life such as e-commerce, industries, and many more. As the usage of software grows rapidly, the rise of software piracy has become a major concern for software developers or software vendors. Software piracy has caused USD 50 billion lost in the global software industry, whereas USD 368 million lost in Malaysia [1]. One of the major threats in software industries is software piracy that kept growing and violating intellectual property of the developers [2]. Pirates tend to copy algorithms in the software and to make it worse, they claim them as their works [3]. Thus, many attempts are made in order to discourage software piracy [4]. Software watermarking is one of the techniques that can be used information inside software [5]. Cappaert et al. [5] stated that the information can be retrieved later and used when needed to prove ownership of the original developer. For example, A is a software developer and sells his software to B. B copied the software and sells it to third parties and claimed it as his work. When A knew that B copied his product, A could use the watermark that has been embedded in the software to prove his ownership. Considering the importance of software watermarking, many algorithms were developed in embedding and recognizing watermark [6]. Usually, watermark is translated into unreadable string before embedding into software to avoid visibility of watermark [7]. The process of converting and translating the code is known as encoding process. In this paper, we present an encoding scheme that produce a fixed size encoded watermark so that the dummy method that is used to embed the watermark is less noticeable to attackers. Furthermore, with the proposed encoding scheme, long information can be used as a watermark. The paper is organized as follows. The next section describes related works in software watermarking. Section III presents the overview of encoding process and explains our proposed encoding scheme and also outlines each process in it. We present the results of our proposed encoding scheme in Section IV. This paper concludes in Section V. II. RELATED WORKS In software watermarking, many algorithms and techniques (e.g. dummy method insertion, register allocation, opaque predicates and graph coloring approach) were introduced in literature. Dummy method insertion was first introduced by Monden et al. [8]. The algorithm inserts a dummy method into the software that is then used to hide the watermark. The advantages and disadvantages of dummy method insertion technique have been evaluated in [9]. One of the advantages of dummy method insertion is high data rate. High data rate represents a good point in the algorithm as it can hide a large portion of watermark within the software. Previous encoding scheme that were used in [8] build their own translation code based on watermark character sequences. A hash table needs to be built in order to provide assignment rule for embedding process. As for dummy method algorithm, since the algorithm prepares the space for dummy method according to the size of the watermark; it has no difficulties in embedding large size watermark. Thus, no could provide spaces for the watermark. However, in roduced. In this situation, 35 978-1-4577-2155-7/11/$26.00 c 2011 IEEE