ISSN 1054-6618, Pattern Recognition and Image Analysis, 2017, Vol. 27, No. 2, pp. 326–337. © Pleiades Publishing, Ltd., 2017. Urban Areas Extraction from Multi Sensor Data Based on Machine Learning and Data Fusion 1 S. Puttinaovarat a, * and P. Horkaew b, ** a Faculty of Science and Industrial Technology, Prince of Songkla University, Surat Thani Сampus, Surat Thani, Thailand b School of Computer Engineering, Institute of Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand *e-mail: supattra.p@psu.ac.th **e-mail: phorkaew@sut.ac.th Abstract– Accurate urban areas information is important for a variety of applications, especially city planning and natural disaster prediction and management. In recent years, extraction of urban structures from remotely sensed images has been extensively explored. The key advantages of this imaging modality are reduction of sur- veying expense and time. It also elevates restrictions on ground surveys. Thus far, much research typically extracts these structures from very high resolution satellite imagery, which are unfortunately of relatively poor spectral resolution, resulting in good precision yet moderate accuracy. Therefore, this paper investigates extraction of buildings from middle and high resolution satellite images by using spectral indices (Normalized Difference Building Index: NDBI, Normalized Difference Vegetation Index: NDVI, Soil Adjustment Vegeta- tion Index: SAVI, Modified Normalized Difference Index: MNDWI, and Global Environment Monitoring Index: GEMI) by means of various Machine Learning methods (Artificial Neural Network: ANN, K-Nearest Neighbor: KNN, and Support Vector Machine: SVM) and Data Fusion (i.e., Majority Voting). Herein empir- ical results suggested that suitable learning methods for urban areas extraction are in preferring order Data Fusion, SVM, KNN, and ANN. Their accuracies were 85.46, 84.86, 84.66, and 84.91%, respectively. Keywords: urban areas extraction, spectral indices, machine learning, data fusion DOI: 10.1134/S1054661816040131 INTRODUCTION Urban areas (e.g., buildings and other manmade structures, etc.) extraction plays a crucial part in a variety of geographical applications such as city planning, natural disaster simulation, prediction and management, as well as regional change detec- tion [1]. Information regarding manmade struc- tures, such as extent, formation and pattern, for examples, are necessary for preparing the input data for natural disaster simulation [2]. Moreover, they are also used in assessing post-natural disaster dam- ages [3]. In recent years, extraction of urban areas or buildings from remotely sensed image has been a widely investigated field. It has consequently been well recognized that satellite images with high [4–6] and middle [7] resolutions, LIDAR [8, 9] and SAR [10] images are the useful sources of information for such procedure. Several multi-spectral reflectance indices have also been proposed for estimation and extraction of land covers and land uses, such as water body extraction [11], forest land cover classi- 1 The article is published in the original. fication [12], and road extraction [13], etc. The spectral indices most frequently used for detection of urban structures are NDBI [14], NDVI [15], SAVI [16], and MNDWI [17], while some of exist- ing indices are not readily suitable for this purpose. They include GEMI [18], IBI [19], NBI [20], and BUI [21]. Many studies presented procedures for building extraction from high resolution satellite imagery (e.g., QuickBird, SPOT, IKONOS, THEOS, etc.) [4–6], but these data are often quite expensive and difficult to obtain. Furthermore, these high resolu- tion images are typically of low spectral resolutions [22, 23]. On the other hand, the middle resolution satellite imagery has attractive qualities, i.e., its availability at low or no cost [24, 25] and its rela- tively high spectral resolution [26]. Despite its supe- rior precision [27] (i.e., being able to detect building at smaller scale), urban extraction based on these satellite images relied primarily on single band spa- tial data, and hence suffered from moderately accu- rate and less reliable observations, due to compro- mised acquiring conditions. Currently, the Landsat images are being used to support a wide range of building or urban areas extraction applications [7, 28–30]. Motivated by these APPLIED PROBLEMS Received February 3, 2016