International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-10, August 2019
258
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number I8157078919/2019©BEIESP
DOI: 10.35940/ijitee.I8157.0881019
Abstract: Due to growth of multi-modal data, large amount of
data is being generated. Nearest Neighbor (NN) search is used to
retrieve information but it suffers when there is
high-dimensional data. However Approximate Nearest Neighbor
(ANN) is a searching method which is extensively used by the
researchers where data is represented in form of binary code
using semantic hashing. Such representation reduces the storage
cost and retrieval speed. In addition, deep learning has shown
good performance in information retrieval which efficiently
handle scalability problem. The multi-modal data has different
statistical properties so there is a need to have method which
finds semantic correlation between them. In this paper,
experiment is performed using correlation methods like CCA,
KCCA and DCCA on NMIST dataset. NMIST dataset is
multi-view dataset and result proves that DCCA outperforms
over CCA and KCCA by learning representations with higher
correlations. However, due to flexible requirements of users,
cross-modal retrieval plays very important role which works
across the modalities. Traditional cross-modal hashing
techniques are based on the hand-crafted features. So
performance is not satisfactory as feature learning and binary
code generation is independent process. In addition, traditional
cross-modal hashing techniques fail to bridge the heterogeneous
gap over various modalities. So many deep-based cross-modal
hashing techniques were proposed which improves the
performance in comparison with non-deep cross-modal
techniques. Inside the paper, we presented a comprehensive
survey of hashing techniques which works across the modalities.
Index Terms: Multi-Modal data, Deep CCA (DCCA),
cross-modal retrieval, hashing.
I. INTRODUCTION
Due to advancement of World Wide Web, different types
of data like text, images, audio, video are generated which is
semantically consistent. Such data is called multi-modal data.
As requirements of users are very flexible, need to develop a
retrieval system which works across different modalities.
Such retrieval is called cross-modal retrieval where users can
give any modality as the input [1,3,6,21,27]. In addition,
such retrieval provides complementary information which
may be useful in decision making or in any recommendation
system. Nearest Neighbor (NN) is widely used in information
retrieval but very expensive as dimensionality increases. So
researchers are focusing on Approximate Nearest Neighbor
(ANN) which resolves the problem of NN by giving
Revised Manuscript Received on August 05, 2019.
Nikita Bhatt, U & P U Patel Department of Computer Engineering, CSPIT,
CHARUSAT, Gujarat, India. E-mail: nikitabhatt.ce@charusat.ac.in
Dr. Amit Ganatra, U & P U Patel Department of Computer Engineering,
CSPIT, CHARUSAT, Gujarat, India.
approximate solution [3, 4]. The indexing scheme called
hashing of ANN is widely used which map high-dimensional
data to binary code in comparison with tree-based indexing
scheme [21, 27]. Such binary representation leads to less
time and less space which helps for efficient retrieval [2,4,5].
In order to retrieve information across multi-modal data,
multi-modal hashing (MMH) is used. MMH is categorized in
two parts: Multi-Source Hashing (MSH) and Cross-Modal
Hashing (CMH). But application scenario of MSH is limited
in comparison with CMH as all modalities of data might not
be present in practical scenario which is prerequisite for
MSH. So CMH is used which explore the correlation among
modalities to activate the cross-modal similarity search
[11,18, 25, 26, 27]. Existing CMH strategies do feature
learning and hash code learning as autonomous technique
which may not achieve satisfactory result. But as the
emerging technique called deep learning has shown
promising result in feature generation, it is not only used as a
feature extractor but also use as a hash code generator and it
is done in single framework [2,3]. Remaining portion of the
paper covers different cross-modal retrieval methods.
However, it is required to find semantic similarity between
multi-modal data for efficient retrieval as they have different
statistical properties. There are many methods available in
literature which finds semantic similarity between
multi-modal data. Here experiment is performed using
canonical correlation analysis (CCA), kernel canonical
correlation analysis (KCCA) and deep canonical correlation
analysis (DCCA).
II. STUDY ON CROSS-MODAL RETRIEVAL METHODS
Cross-Modal retrieval system is broadly divided into
common subspace-learning and cross-modal hashing
methods [31]. In common subspace-learning, different
modalities are mapped to the common subspace which
preserve the similarity across modalities. For faster retrieval
common representation is mapped to binary code using
hashing techniques. Remaining portion covers different
methods for common subspace-learning and cross-modal
hashing [31].
A. Common Subspace-Learning
Submit your manuscript electronically for review. The
subspace-learning technique learns a typical subspace in
order to preserve the correlations among various modalities
where the likeness will be
directly calculated [10].
Figure 1 shows how
common subspace is
Semantic Correlation Based Deep Cross-Modal
Hashing For Faster Retrieval
Nikita Bhatt, Amit Ganatra