An Unsupervised Deep Learning Approach for Satellite Image Analysis with Applications in Demographic Analysis Jessica Block 1,* , Mehrdad Yazdani 1,* , Mai Nguyen 2 , Daniel Crawl 2 , Marta Jankowska 1 , John Graham 1 , Tom DeFanti 1 and Ilkay Altintas 2 Abstract—High resolution satellite imagery is a growing source of data with potential applications in many diverse domains. Efficient large scale analysis of this rich data can lead to unprecedented discoveries with societal impact. We present a new framework for organizing collections of satellite images into demographically relevant categories using unsupervised learning techniques. Our framework first extracts features using pre-trained Convolutional Neural Networks from tiles of high resolution satellite images of a city. The k-means algorithm is then applied to these features to organize images into visually similar groups. The resulting clustered images are validated using demographic data. The cluster model is then applied to six different cities around the world to test the transferability of our methods. Finally, the discovered image clusters are visualized in our customized web interface to enable demographers, social scientists, and economists to understand the organization of a city. I. INTRODUCTION All over the world people are migrating into cities at higher rates than ever before, and for the first time, more people in the world live in cities rather than in rural areas [1]. Construction is occurring rapidly to manage this migration and cities are growing at unprecedented rates. Monitoring these changes is critical for understanding the local and global impacts, and doing so is increasingly difficult. In some countries, regulated construction provides documentation for when, where and how much construction will occur and how many people it can accommodate. However, in developing cities, governments and social organizations seek to know where concentrations of people live because a significant percentage of city dwellers live in informal settlements or slums. It takes significant cost and manpower to manually count people, and if the extent of informal settlements is not known, it can be difficult to know how much manpower will be needed in different areas to conduct a census, or to provide basic services. Satellite imagery provides an exciting opportunity to ad- dress this problem to detect and measure human settlements of different types. Previous work has taken advantage of free US government satellite sensors such as MODIS [2] and VIIRS [3], which has return intervals at least twice a day with spatial resolutions good enough (250 meters - 1000 meters) to detect major city features such as night lights [4]. LANDSAT collects imagery at lower temporal resolutions but has a spatial resolution of 30 meters, which can detect major *Correspondence: j.block@eng.ucsd.edu and myazdani@gmail.com 1 Qualcomm Institute, University of California San Diego, La Jolla, CA U.S.A. myazdani@gmail.com, j.block@eng.ucsd.edu, majankowska@ucsd.edu, jjgraham@eng.ucsd.edu, tdefanti@eng.ucsd.edu 2 San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA U.S.A. mhnguyen@sdsc.edu, crawl@sdsc.edu, altintas@sdsc.edu roads and generalized vegetation [5]. However, imagery from Digital Globe [6] satellites have a spatial resolution of 0.5 meters, and can detect features such as individual trees, human structures and cars. Even if the thermal spectral resolution is not used, the textural and spatial features in imagery of this spatial resolution can be leveraged using Convolutional Neural Networks (CNN), which were developed to analyze pictures from hand-held cameras. Informal settlements have distinctive visual features different from formally planned urbanization. By integrating open geospatial tools with CNN, we performed a spatially relevant evaluation of neighborhoods in Mumbai, India as our test case. The motivation for this project comes from our evalua- tion of the Indicus demographic data (provided by Indicus Analytics, A Nielsen Company) [7] against the high resolu- tion satellite imagery. The geographic extent for the defined neighborhoods is shown in Figure 1(a), and Figure 1(b), which shows per capita income per neighborhood. However, within each neighborhood there are clear distinctions between formal and informal settlements as shown in Figure 1(c), indicating that the designated neighborhoods in the Indicus demographic data are too coarse-grained to accurately represent the popula- tions within them. We hypothesized that unsupervised machine learning would distinguish these neighborhoods in higher resolution. Contributions. In this paper, we describe our methods to investigate whether unsupervised machine learning can dis- tinguish the spatial characteristics of different socioeconomic regions, specifically slums, from other neighborhoods where high resolution demographic data is not available. In particular, we (i) define a data preparation pipeline for preprocessing satellite imagery for machine learning processes; (ii) introduce methods to utilize deep learning for feature extraction of satellite imagery; (iii) introduce an ordered clustering method to organize image collections; (iv) validate the developed machine learning methodology on Mumbai with demographic data; and (v) test the transferability of the methodology from Mumbai to other cities around the world. Figure 2 describes our end-to-end processing pipeline for unsupervised deep learning using satellite imagery. The green data preparation component in this figure illustrates the steps for the ingestion of the satellite imagery to cropped image tiles ready for data-parallel feature extraction and analysis. The cropped images are then processed using a machine learning pipeline, illustrated in the blue component, that includes CNN- based feature extraction, clustering and principal component analysis methods, and a sorting algorithm that organizes the related components in groups. The generated sorted histograms for clusters are then visualized using a web-based map inter- face.