Proceedings of the 3 rd National Conference; INDIACom-2009 Computing For Nation Development, February 26 – 27, 2009 Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi A Survey on Text Based Steganography Hitesh Singh, Pradeep Kumar Singh, Kriti Saroha School of Information Technology, Center for development of Advance Computing, Noida, India hitesh.singh.85@gmail.com , pradeep_84cs@yahoo.com kritisaroha@cdacnoida.in ABSTRACT In a modern era of Information Technology, illicit copying and illegal distribution accompany the adoption of widespread electronic distribution of copyrighted material. This is the main reason why people think about how to protect their work and how to prevent such unlawful activities. For this purpose various methods including cryptography, steganography, coding and so on have been used. Steganography is the best- suited technique that allow user to hide a message in another message (cover media). Most of steganography research uses cover media as pictures, video clips and sounds. However, text steganography is not normally preferred due to the difficulty in finding redundant bits in text document. To embed information inside a document its characteristics should be altered. These characteristics can be either the text format or characteristics of the character. But the problem is that if slight change has been done to the document then it will become visible to the third party or attacker. The key to this problem is that to alter the document in such a way that it is simply not visible to the human eye yet it is possible to decode it with computer. For this purpose various methods of text-based steganography have been purposed like line shifting, word shifting, feature coding, white space manipulation etc. In this paper, we present an overview of the steganography, with a particular focus on text- based steganography in details. Ke ywor ds: copyrighted material, cryptography, steganography, and text-based steganography. 1. INTRODUCTION Though security is nothing new, the way that security has become a part of our daily lives today is unprecedented. From pass codes that we use to enter our own highly secure homes, to retina-scanning technology that identifies us as we enter our office buildings, to scanners in airports, we have made security technology as much a part of our daily lives as the telephone or automobile. We are also surrounded by a world of secret communication, where people of all types are transmitting information as innocent as an encrypted credit card number to an online store and as insidious as a terrorist plot to hijackers. The schemes that make secret communication possible are not new. Julius Caesar used cryptography to encode political directives. Steganography (commonly referred to as stego), the art of hidden writing, has also been used for generations. But the intersection of these schemes with the pervasive use of the Internet, high-speed computer and transmission technology, and our current world political climate makes this a unique moment in history for covert communication. The word steganography comes from the Greek stegauw (covered writing) [3]. Steganography is an ancient art of embedding private messages in seemingly innocuous messages in such a way that prevents the detection of the secret messages by a third party. In other words, steganography means establishing covert channels. A covert channel is a secret communication channel used for transmitting information. As shown in Figure 1.1, two general directions can be distinguished within steganography: protection against detection and protection against removal . Protection against detection is achieved using schemes that do not modify in a visible way the original unmarked object; the modifications are not visible by the humans or by the computers. Protection against removal supposes that the scheme should be robust to common attacks; it is impossible to remove the hidden data without degrading the object’s quality and rendering it useless. Figure 1.1. Directions within Steganography. 2. TEXT STEGANOGRAPHY Te xt steganography [2,3,8], which is what this paper specifically deals with, uses text as the medium in which to hide information. It is the most difficult kind of steganography; this is due largely to the relative lack of redundant information in a text file as compared with a picture or a sound file. The structure of text documents is identical with what we observe, while in other types of documents such as in picture, the structure of document is different from what we observe. Therefore, in such documents, we can hide information by Steganography (Covered w riting, covert channels) Protection against detection (Data hiding) Protection against removal (Document marking) Watermarking (Al objects are marked in the same w ay) Fingerprinting (Identify all objects, every object is marked specific)