Abstract—Internet is largely composed of textual contents and a huge volume of digital contents gets floated over the Internet daily. The ease of information sharing and re-production has made it difficult to preserve author’s copyright. Digital watermarking came up as a solution for copyright protection of plain text problem after 1993. In this paper, we propose a zero text watermarking algorithm based on occurrence frequency of non-vowel ASCII characters and words for copyright protection of plain text. The embedding algorithm makes use of frequency non-vowel ASCII characters and words to generate a specialized author key. The extraction algorithm uses this key to extract watermark, hence identify the original copyright owner. Experimental results illustrate the effectiveness of the proposed algorithm on text encountering meaning preserving attacks performed by five independent attackers. Keywords—Copyright protection, Digital watermarking, Document authentication, Information security, Watermark. I. INTRODUCTION ECURITY of digital contents has gained tremendous importance in current digital era. Internet has become an essential part of our daily life for the transfer of different forms of data such as emails, news papers, articles, websites, images, audios, videos, commercials, and opinion blogs. Most of the information over the Internet is in the form of text and the copyright protection of text is one of the major concerns of its creator\author. In order to protect copyrights, digital watermarking came up as a solution for the identification of the owner of the concerned copyright material. In case of audio, video, and images; digital watermarking has been used for decades. However, no significant work has been done regarding the copyright protection of plain text documents. Text is a very important form of the Internet. It is part of e-books, websites, articles, news, chats, emails, and SMS. Text documents face many threats such as copying, tampering, plagiarism, reproduction, and paraphrasing attacks. The best Jalil Z. is with Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Pakistan. (email: zunera.jalil@nu.edu.pk) Farooq M. and Zafar H. are with Air University, Islamabad, Pakistan.(email: 91102@students.au.edu.pk) Sabir M. and Ashraf E. are faculty members at Department of Computer Science and Software Engineering, Air University Islamabad, Pakistan. (email: maria.sabir@mail.au.edu.pk, erum.ashraf@mail.au.edu.pk ) solution to address these problems is digital watermarking, which not only helps in authentication of the digital material but also in its protection. Digital watermarking can be used to identify the owner of the copyright material which may be in the form of audio, video, image, a plain text. There are two forms of digital watermarking, visible and invisible but the later one is considered more robust. Digital watermark is an identification code embedded in the data. It mean that unlike conventional cryptographic techniques it remain present within the data even after the decryption [1]. The problem of Digital Text Watermarking has been studied in the past but a practical and efficient text watermarking algorithm is not yet provided for meaning preserving attacks. The main contributions of this paper to the watermarking community are: • A zero text watermarking algorithm for copyright protection of plain text documents is proposed. • There is no restriction about the type and length of text. • Pure alphabetical watermarks are used which are more convenient to be used for plain text • No changes are made in the text rather attributes of the text are used in the proposed approach. • This approach towards medium size files like emails, short articles and news is robust and practical to identify the original copyright owner of the contents. This paper is organized as follows: Section 2 gives an overview about the earlier work on text watermarking. The propose algorithm for embedding and extraction are discussed in detail in section 3. In section 4, the experimental results for intelligent meaning preserving attacks (insertion and deletion) performed by five different attackers are provided. Efficiency of the proposed algorithm is analyzed by five different attacks on the same text. The last section concludes the paper along with directions for future research. II. STATE OF THE ART Text watermarking is an emerging domain for research. A robust and practical solution may open new horizons to the information security world. Many watermarking techniques have been developed since 1993, which includes text watermarking that uses text image, synonyms based, noun- verb, word and sentence structure based, acronyms based schemes and many others. These schemes can be placed in the following categories; image-based schemes, syntactic schemes, semantic schemes and structural schemes. Improved Zero Text Watermarking Algorithm against Meaning Preserving Attacks Jalil Z., Farooq M., Zafar H., Sabir M., and Ashraf E. S World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering Vol:4, No:10, 2010 1534 International Scholarly and Scientific Research & Innovation 4(10) 2010 scholar.waset.org/1307-6892/9037 International Science Index, Computer and Information Engineering Vol:4, No:10, 2010 waset.org/Publication/9037