Novel Steganography over HTML Code Ammar Odeh, Khaled Elleithy, Miad Faezipour, and Eman Abdelfattah Department of Computer Science & Engineering, University of Bridgeport Bridgeport, CT 06604, USA {aodeh, elleithy, faezipour, eman}@bridgeport.edu AbstractDifferent security strategies have been developed to protect the transfer of information between users. This has become especially important after the tremendous growth of internet use. Encryption techniques convert readable data into a ciphered form. Other techniques hide the message in another file, and some powerful techniques combine hiding and encryption concepts. In this paper, a new security algorithm is presented by using Steganography over HTML pages. Hiding the information inside Html page code comments and employing encryption, can enhance the possibility to discover the hidden data. The proposed algorithm applies some statistical concepts to create a frequency array to determine the occurrence frequency of each character. The encryption step depends on two simple logical operations to change the data form to increase the complexity of the hiding process. The last step is to embed the encrypted data as comments inside the HTML page. This new algorithm comes with many advantages, such as generality, applicability to different spoken languages, and can be extended to other Web programming pages such as XML, ASP. Key wordsSteganography, Carrier file, Encryption, HTML code, I. INTRODUCTION The rapid growth of the Internet has led to the increasing demand for security mechanisms to facilities the transformation of sensitive information through different networks. Since the Internet is a public media used to transfer information between different parties [1], hackers can exploit the messages’ contents between communicating parties. On the other hand, different methods have been developed to prohibit an attempt to break or expose actual messages. Encryption algorithms reported in literature protect sensitive information by converting plaintext into ciphertext. Modern encryption algorithms depend on sophisticated mathematical operations to change the information form. Other techniques depend on concealing the message existence, which is called Steganography [2]. As Figure 1 shows, Steganography consists of three main components; embedding algorithm, carrier file and the hidden message. Figure 1.Embedding Algorithm The carrier file plays an important role in designing steganography algorithms. Image , audio, video, and text are different media used frequently over the Internet[3]. Each of these carrier files types has certain characteristics that enable the user to insert the data inside. Image files are the most widely used files as carrier files which contain high ratio of data frequency [4]. On the other hand, it is not easy to use the same image to hide different messages, since comparing similar images may allow attackers to expose the concealed data. Audio files are represented as sine or cosine waves. Some techniques suggest to shift the phase to hide zero’s and one’s [5]. Text files represent the most difficult carrier files, since text files contain little redundant data compared to other carriers [6]. Text Steganography is classified into different categories. One of the most popular text Steganography methods is semantic Steganography [7]. This technique makes use of synonyms in the same language or similar languages such as American English and British English. This is done by creating a dictionary of synonyms and exchanging words to pass zero or one. Other categories hide data depending on the language syntax. This is known as the earliest techniques that employ the physical format of text to conceal information. Other scenarios employ linguistic properties to hide data and depend on the file generation to convey the information [8]. In section II of this paper, prior work is presented and compared. The proposed algorithm is discussed in section III. Experimentation and results are demonstrated in section IV. The algorithm is analyzed in section V. Finally, section VI offers conclusions. II. PRIOR WORKS HTML or Hyper Text Markup Language is the basic programming language for web pages, which can be combined with other languages such as Macromedia Flash and Java Script for animation goals [9]. Moreover, HTML does not need special software for programming. Most of the new web programming languages are based on HTML concepts. Generally, HTML is used to create the static part of websites. HTML code consists of two parts; i) tag which is surrounded by angle parentheses (< >), and ii) the information between tags. Internet browsers only display the content without tags, since tags control the appearance of the