2018 21st International Conference of Computer and Information Technology (ICCIT), 21-23 December, 2018
978-1-5386-9242-4/18/$31.00 ©2018 IEEE
A Feasible 6 Bit Text Database Compression
Scheme with Character Encoding (6BC)
Md. Ashiq Mahmood
1
, Tarique Latif
1
, K.M. Azharul Hasan
1
, Md. Riadul Islam
2
Department of Computer Science and Engineering (CSE)
1
Khulna University of Engineering & Technology,
2
North Western University, Khulna
Khulna, Bangladesh
Email: {ashiqmahmoodbipu, tariquelatifsami, azhasan, riadnwu}@gmail.com
Abstract- Character encoding implies representing a
repertoire of characters by some sort of encoding
framework. Encoding a character in a compelling procedure
is in every case estimable in light of the fact that it requires
a couple of bits and least investment for information. It has
an enormous region of utilization including data
correspondence, data stockpiling, transmission of textual
information and database innovation. In this paper, a new
compression technique is proposed for text data which
encodes a character by 6 bits to be specific 6-Bit Text
database Compression (6BC). This strategy works with a
system of encoding by 6 bit for characters which are
printable by utilizing a lookup table. 8 bit characters are
converted into 6 bit by this procedure and it partitions the
characters into 4 sets. At that point, it utilizes the location of
the characters uniquely to encode it by 6 bit. This strategy is
likewise utilized in database innovation by compressing the
text data in a connection of a database. With the assistance
of a lookup table, 6BC can compress and in addition
decompress the original data. Reverse procedure for
decompression to get back the original data is additionally
detailed. The result of 6BC is further applied to compress by
the known algorithm to be specific Huffman and LZW.
Promising efficiency is appeared by our experimental result.
The procedure is further demonstrated by some examples
and descriptions.
Keywords- character encoding; compression; decompression;
6 bit text database compression; compression ratio.
I. INTRODUCTION
Data science applications, database compression really
characterizes the compaction of data in shorter memory space
including the genuine representation of primary database data
staying unaltered. Data compression is a very encouraging
issue due to the necessity of capacity and diverse system's
transfer speed. As of late, data compression turns into an
incomprehensibly utilized idea in software engineering.
Decreasing bandwidth and capacity prerequisite, encoding
less no of bits, less time prerequisite for transmission,
compelling use of the channel, can be accomplished by the
compression of data which is a clear advantages of data
compression. Encoding a character by fewer bits is extremely
fundamental for having a powerful compression scheme [8].
By utilizing fewer bits, the original representation is
compacted in a less storage space [2]. In view of the huge
significance of data compression, it is fundamental to locate a
befitting encoding procedure for the introduction of a
wellspring of data as precise as conceivable utilizing the less
number of bits putting the importance of data being safe and
unaltered [2][3]. In this paper, a compression scheme is
exhibited by encoding a character by 6 bits rather than
common 8 bits. It is specified as 6BC (6-Bit Text database
Compression) [1]. The reclamation of the data to its unique
configuration for the compacted shape is very important for
the greater part of the cases. To do this, a decoding technique
is created, and the performance of this technique is very
encouraging pertinent to that operation. As a matter of fact,
there are two classes of data compression techniques are
accessible which incorporates lossless and lossy data
compression [1]. Lossy compression is a sort of data
compression framework where file size is decreased by
disposing of some irrelevant data that won't be comprehended
by a human after decoding. Video and audio compression are
the fundamental cases of lossy data compression [3]. On the
other hand, Lossless compression changes each piece of data
to diminish the size without losing any data after decoding. As
a matter of fact, the significance of lossless data compression
is it doesn't permit any data misfortune. Regardless of whether
a solitary piece of data is lost subsequent to interpreting, that
decides the document is adulterated [3][4]. The most alluring
piece of data compression is packing data in a database
framework. The execution is massively enhanced in light of
the fact that littler size of physical data is should have been
moved for any task on the database [2]. By Text compression,
we can comprehend it is the strategy of changing a unique
symbol of the source data to a little symbol which must
guarantee that a similar data ought to contain as the original
data and shorter representation [5][6][7]. We developed a
lossless method named 6BC. The algorithm comprises both
encoding and decoding techniques. 8 bit characters are
changed over to 6 bits by this 6BC procedure which partitions
the characters into 4 sets and utilizing them in a lookup table.
We put the characters in the lookup table as indicated by their
recurrence of utilization in English dialect. At that point, it
utilizes the location of characters particularly to encode by 6
bits. By assembling a similar set code characters, it makes
little piece code arrangement. It can effectively compress the
printable characters and the promising compression ratio is
accomplished by this legitimate compression [9].
II. RELATED WORKS
A lossless algorithm which utilizes entropy is the known
Huffman algorithm [1]. Here the code is constructed from the
binary trees. It can also be said as variable length encoding as
it utilizes variable length encoding. In any case, 6BC is settled
to a size of 6 bits. Here a less number of bits utilizes more
incessant words which is its primary rule. JPEG documents
utilize Huffman coding. A wavelet-emerged compression
scheme for the image is JPEG 2000 [11][12]. There are two
families in Huffman Encoding which are: - Adaptive Huffman