Semantic Segmentation Datasets for Resource Constrained Training Ashutosh Mishra 1? , Sudhir Kumar 1,2? , Tarun Kalluri 1,3? , Girish Varma 1 , Anbumani Subramaian 4 , Manmohan Chandraker 3 , and CV Jawahar 1 1 IIIT Hyderabad 2 University at Buﬀalo, State University of New York 3 University of California, San Diego 4 Intel Bangalore Abstract. Several large scale datasets, coupled with advances in deep neural network architectures have been greatly successful in pushing the boundaries of performance in semantic segmentation in recent years. However, the scale and magnitude of such datasets prohibits ubiquitous use and widespread adoption of such models, especially in settings with serious hardware and software resource constraints. Through this work, we propose two simple variants of the recently proposed IDD dataset, namely IDD-mini and IDD-lite, for scene understanding in unstructured environments. Our main objective is to enable research and benchmark- ing in training segmentation models. We believe that this will enable quick prototyping useful in applications like optimum parameter and architecture search, and encourage deployment on low resource hardware such as Raspberry Pi. We show qualitatively and quantitatively that with only 1 hour of training on 4GB GPU memory, we can achieve satisfactory semantic segmentation performance on the proposed datasets. Keywords: Semantic Segmentation, Neural Architecture Search 1 Introduction and Related Work Semantic segmentation is the task of assigning pixel level semantic labels to images, with potential applications in ﬁelds such as autonomous driving [5,16] and scene understanding. Many approaches have been proposed to tackle this task based on modern deep neural networks [18,12,4,14]. Majority of the proposed approaches use encoder-decoder networks that aggregate spatial information across various resolutions for pixel level labeling of images. For example, [12] proposes an end-to-end trainable network for semantic segmentation by replacing the fully connected layers of pretrained AlexNet [8] with fully convolutional layers. Segmentation architectures based on dilated convolutions [17] for real time performance have also been proposed in [18,14]. However most of these approaches come with huge overhead in training time and inference time since it ? equal contribution.