International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 05 Issue: 06 | June - 2021 ISSN: 2582-3930
© 2021, IJSREM | www.ijsrem.com | Page 1
Extracting Latent Variables and Implementing in Multimodal
Variational Autoencoders
Amit Khare
Arya Pratap Singh
Acropolis Institute of Technology and Research, Indore(M.P.)
Abstract - Significance of Autoencoders are
revolutionizing the data we process today.
Dimensionality Reduction, Image
Compression, Image Denoising, Feature
Extraction, Image generation, Sequence to
sequence prediction, Recommendation system
and what not. This paper focuses on the
Dimensionality Reduction part, specifically,
extracting the Latent Variables.
Several techniques of encoding and decoding
have been used to extract latent variables, but
this paper presents an entirely new technique
of using the power of Autoencoders for
extraction of Data. Machine learning is about
capturing aspects of the unknown distribution
from which the observed data are sampled (the
data-generating distribution). For many
learning algorithms and in particular in
manifold learning, the focus is on identifying
the regions (sets of points) in the space of
examples where this distribution concentrates,
i.e., which configurations of the observed
variables are plausible. Unsupervised
representation-learning algorithms try to
characterize the data-generating distribution
through the discovery of a set of features or
latent variables whose variations capture most
of the structure of the data-generating
distribution.
1. INTRODUCTION
Affective Computing studies frequently
collect rich, multimodal data from a
number of different sources in order to be
able to model and recognize human affect.
These data sources — whether they are
physiological sensors, smartphone apps,
eye trackers, cameras, or microphones —
are often noisy or missing. Increasingly, such
studies take place in natural environments over long
periods of time, where the problem of missing data
is exacerbated. For example, a system trying to learn
how to forecast a depressed mood may need to run
for many weeks or months, during which time
participants are likely to not always wear their
sensors, and sometimes miss filling out surveys.
While research has shown that combining more
data sources can lead to better predictions, as each
noisy source is added, the intersection of samples
with clean data from every source becomes smaller
and smaller. As the need for long-term multimodal
data collection grows, especially for challenging
topics such as forecasting mood, the problem of
missing data sources becomes especially
pronounced. While there are a number of techniques
for dealing with missing data, more often than not
researchers may choose to simply discard samples
that are missing one or more modalities. This can
lead to a dramatic reduction in the number of
samples available to train an affect recognition
model, a significant problem for data-hungry
machine learning models. Worse, if the data are not
missing completely at random, this can bias the
resulting model. In this paper we propose a novel
method for dealing with missing multimodal data
based on the idea of denoising Autoencoders. A
denoising autoencoder is an unsupervised learning