International Journal of Scientific Research in Engineering and Management (IJSREM) Volume: 05 Issue: 06 | June - 2021 ISSN: 2582-3930 © 2021, IJSREM | www.ijsrem.com | Page 1 Extracting Latent Variables and Implementing in Multimodal Variational Autoencoders Amit Khare Arya Pratap Singh Acropolis Institute of Technology and Research, Indore(M.P.) Abstract - Significance of Autoencoders are revolutionizing the data we process today. Dimensionality Reduction, Image Compression, Image Denoising, Feature Extraction, Image generation, Sequence to sequence prediction, Recommendation system and what not. This paper focuses on the Dimensionality Reduction part, specifically, extracting the Latent Variables. Several techniques of encoding and decoding have been used to extract latent variables, but this paper presents an entirely new technique of using the power of Autoencoders for extraction of Data. Machine learning is about capturing aspects of the unknown distribution from which the observed data are sampled (the data-generating distribution). For many learning algorithms and in particular in manifold learning, the focus is on identifying the regions (sets of points) in the space of examples where this distribution concentrates, i.e., which configurations of the observed variables are plausible. Unsupervised representation-learning algorithms try to characterize the data-generating distribution through the discovery of a set of features or latent variables whose variations capture most of the structure of the data-generating distribution. 1. INTRODUCTION Affective Computing studies frequently collect rich, multimodal data from a number of different sources in order to be able to model and recognize human affect. These data sources whether they are physiological sensors, smartphone apps, eye trackers, cameras, or microphones are often noisy or missing. Increasingly, such studies take place in natural environments over long periods of time, where the problem of missing data is exacerbated. For example, a system trying to learn how to forecast a depressed mood may need to run for many weeks or months, during which time participants are likely to not always wear their sensors, and sometimes miss filling out surveys. While research has shown that combining more data sources can lead to better predictions, as each noisy source is added, the intersection of samples with clean data from every source becomes smaller and smaller. As the need for long-term multimodal data collection grows, especially for challenging topics such as forecasting mood, the problem of missing data sources becomes especially pronounced. While there are a number of techniques for dealing with missing data, more often than not researchers may choose to simply discard samples that are missing one or more modalities. This can lead to a dramatic reduction in the number of samples available to train an affect recognition model, a significant problem for data-hungry machine learning models. Worse, if the data are not missing completely at random, this can bias the resulting model. In this paper we propose a novel method for dealing with missing multimodal data based on the idea of denoising Autoencoders. A denoising autoencoder is an unsupervised learning