International Journal of Ethics in Engineering & Management Education Website: www.ijeee.in (ISSN: 2348-4748, Volume 1, Issue 1, January 2014) 30 A Novel Approach of Zero Watermarking for Text Documents Pankaj Bhambri Department of information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India pkbhambri@gmail.com Pradeep Kaur Research Scholar, Punjab Technical University, Jalandhar, Punjab, India Abstract— With widespread use of Internet and other communication technologies, it has become extremely easy to reproduce, communicate, and distribute digital contents. As a result, authentication and copyright protection issues have arisen. Text is the most extensively used medium travelling over the Internet besides image, audio, and video. The major part of books, newspapers, web pages, advertisement, research papers, legal documents, letters, novels, poetry, and many other documents is simply the plain text. Copyright protection of plain text is a significant issue which cannot be condoned. In this thesis, we have proposed a zero-watermarking approach towards text watermarking. We propose a zero text watermarking algorithm based on occurrence frequency of vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers and the results are also compared with the recent work on text watermarking. Index Terms— Copyright protection, Digital watermarking, Document authentication, Watermark embedding and extraction I. INTRODUCTION (HEADING 1) Unlike analog media that are becoming obsolete by now, digital media can be accessed, stored, copied, and distributed more easily and in no time. Advancement in digital media and technologies have brought unlimited benefits to mankind, but they also create problems for parties wishing to prevent unauthorized copying and distribution of valuable digital contents such as copyrighted, commercial, secret, and sensitive data. Security of digital contents has gained tremendous importance in current digital era. Internet has become an essential part of our daily life for the transfer of different forms of data such as emails, news papers, articles, websites, images, audios, videos, commercials, and opinion blogs. Most of the information over the Internet is in the form of text and the copyright protection of text is one of the major concerns of its creator\author. Text is the most essential and dominant part of legal documents, reports, and journals; but its protection has been seriously ignored. The threats of electronic publishing like illegal copying and re-distribution of copyrighted material, plagiarism and other forms of copyright violations need to be explicitly addressed, particularly for plain text. II. DIGITAL WATERMARKING A digital watermark is a piece of information which is embedded in the digital media and hidden in the digital content in such a way that it is inseparable from its data. This piece of information known as watermark, a tag, or label into multimedia object such that the watermark can be detected or extracted later to make an assertion about the object. The object may be an image, audio, video, or text. Watermarking is the process of inserting a digital signal or pattern (indicative of the owner of the content) into digital content. The signal, known as a watermark, can be used later to identify the owner of the work, to authenticate the content, and to trace illegal copies of the work. There are two types of digital watermarking: visible (perceptible) and invisible (imperceptible) In visible watermarking, watermarks are embedded in such a way that they are visible when the content is viewed. Invisible watermarks cannot be seen but recovering of watermark is possible with an appropriate decoding algorithm. Invisible watermarks are more robust than visible watermarking. Watermarking can again be robust or fragile. Robust watermarking is a technique in which modification to the watermarked content will not affect the watermark in any way. But in the case of fragile watermarking, watermark gets destroyed when watermarked content is modified or tampered with. Watermarking can also be classified based on the type of document to be watermarked. The classifications are: Image Watermarking, Video Watermarking, Audio Watermarking and Text Watermarking. Text watermarking solutions are not robust against random tampering attacks such as insertion, deletion attacks. In this paper, we propose a zero text watermarking algorithm which is resistant towards random tampering attacks. The important issues that arise in the study of digital watermarking techniques are capacity, robustness, transparency and security. Cryptography only provides security by encryption and decryption. However,encryption cannot protect the content after decryption. Unlike cryptography, watermarks can protect content even after they are decoded. Also cryptography cannot prevent illegal