2018 21st International Conference of Computer and Information Technology (ICCIT), 21-23 December, 2018 978-1-5386-9242-4/18/$31.00 ©2018 IEEE A Feasible 6 Bit Text Database Compression Scheme with Character Encoding (6BC) Md. Ashiq Mahmood 1 , Tarique Latif 1 , K.M. Azharul Hasan 1 , Md. Riadul Islam 2 Department of Computer Science and Engineering (CSE) 1 Khulna University of Engineering & Technology, 2 North Western University, Khulna Khulna, Bangladesh Email: {ashiqmahmoodbipu, tariquelatifsami, azhasan, riadnwu}@gmail.com Abstract- Character encoding implies representing a repertoire of characters by some sort of encoding framework. Encoding a character in a compelling procedure is in every case estimable in light of the fact that it requires a couple of bits and least investment for information. It has an enormous region of utilization including data correspondence, data stockpiling, transmission of textual information and database innovation. In this paper, a new compression technique is proposed for text data which encodes a character by 6 bits to be specific 6-Bit Text database Compression (6BC). This strategy works with a system of encoding by 6 bit for characters which are printable by utilizing a lookup table. 8 bit characters are converted into 6 bit by this procedure and it partitions the characters into 4 sets. At that point, it utilizes the location of the characters uniquely to encode it by 6 bit. This strategy is likewise utilized in database innovation by compressing the text data in a connection of a database. With the assistance of a lookup table, 6BC can compress and in addition decompress the original data. Reverse procedure for decompression to get back the original data is additionally detailed. The result of 6BC is further applied to compress by the known algorithm to be specific Huffman and LZW. Promising efficiency is appeared by our experimental result. The procedure is further demonstrated by some examples and descriptions. Keywords- character encoding; compression; decompression; 6 bit text database compression; compression ratio. I. INTRODUCTION Data science applications, database compression really characterizes the compaction of data in shorter memory space including the genuine representation of primary database data staying unaltered. Data compression is a very encouraging issue due to the necessity of capacity and diverse system's transfer speed. As of late, data compression turns into an incomprehensibly utilized idea in software engineering. Decreasing bandwidth and capacity prerequisite, encoding less no of bits, less time prerequisite for transmission, compelling use of the channel, can be accomplished by the compression of data which is a clear advantages of data compression. Encoding a character by fewer bits is extremely fundamental for having a powerful compression scheme [8]. By utilizing fewer bits, the original representation is compacted in a less storage space [2]. In view of the huge significance of data compression, it is fundamental to locate a befitting encoding procedure for the introduction of a wellspring of data as precise as conceivable utilizing the less number of bits putting the importance of data being safe and unaltered [2][3]. In this paper, a compression scheme is exhibited by encoding a character by 6 bits rather than common 8 bits. It is specified as 6BC (6-Bit Text database Compression) [1]. The reclamation of the data to its unique configuration for the compacted shape is very important for the greater part of the cases. To do this, a decoding technique is created, and the performance of this technique is very encouraging pertinent to that operation. As a matter of fact, there are two classes of data compression techniques are accessible which incorporates lossless and lossy data compression [1]. Lossy compression is a sort of data compression framework where file size is decreased by disposing of some irrelevant data that won't be comprehended by a human after decoding. Video and audio compression are the fundamental cases of lossy data compression [3]. On the other hand, Lossless compression changes each piece of data to diminish the size without losing any data after decoding. As a matter of fact, the significance of lossless data compression is it doesn't permit any data misfortune. Regardless of whether a solitary piece of data is lost subsequent to interpreting, that decides the document is adulterated [3][4]. The most alluring piece of data compression is packing data in a database framework. The execution is massively enhanced in light of the fact that littler size of physical data is should have been moved for any task on the database [2]. By Text compression, we can comprehend it is the strategy of changing a unique symbol of the source data to a little symbol which must guarantee that a similar data ought to contain as the original data and shorter representation [5][6][7]. We developed a lossless method named 6BC. The algorithm comprises both encoding and decoding techniques. 8 bit characters are changed over to 6 bits by this 6BC procedure which partitions the characters into 4 sets and utilizing them in a lookup table. We put the characters in the lookup table as indicated by their recurrence of utilization in English dialect. At that point, it utilizes the location of characters particularly to encode by 6 bits. By assembling a similar set code characters, it makes little piece code arrangement. It can effectively compress the printable characters and the promising compression ratio is accomplished by this legitimate compression [9]. II. RELATED WORKS A lossless algorithm which utilizes entropy is the known Huffman algorithm [1]. Here the code is constructed from the binary trees. It can also be said as variable length encoding as it utilizes variable length encoding. In any case, 6BC is settled to a size of 6 bits. Here a less number of bits utilizes more incessant words which is its primary rule. JPEG documents utilize Huffman coding. A wavelet-emerged compression scheme for the image is JPEG 2000 [11][12]. There are two families in Huffman Encoding which are: - Adaptive Huffman