Incorporating Uncertainty in the Accuracy Assessment of Land Cover Maps using Fuzzy Numbers and Fuzzy Arithmetic Pedro Sarmento, Hugo Carrão and Mario Caetano Centro de Estatística e Gestão de Informação (CEGI) Instituto Superior de Estatística e Gestão de Informação (ISEGI) Lisbon, Portugal Remote Sensing Unit (RSU) Portuguese Geographic Institute (IGP) Lisbon, Portugal mario.caetano@igeo.pt Cidália C. Fonte and Nuno Cortês Institute for Systems and Computer Engineering at Coimbra Department of Mathematics, University of Coimbra Coimbra, Portugal cfonte@mat.uc.pt Abstract—This paper proposes an effort to include uncertainty in reference databases used to assess the accuracy of land cover maps. Five linguistic levels of confidence in land cover labelling are assigned to each sample observation and converted into fuzzy numbers. This information is introduced in a fuzzy confusion matrix and fuzzy accuracy measures, similar to the global, user’s and producer’s accuracy, are then derived from the fuzzy confusion matrix using fuzzy arithmetic. These measures consist of fuzzy numbers that incorporate the uncertainty in identifying the reference land cover class of the sample data. Fuzzy accuracy measures can be defuzzified to generate real numbers, enabling the conversion into crisp measures, which allow the comparison with the accuracy results obtained with traditional confusion matrixes. The proposed methodology is tested on a case study. The quality of a map for Continental Portugal, derived from the automatic classification of MERIS images, is evaluated using a reference database generated with the proposed methodology. Keywords: land cover maps; accuracy assessment; reference database uncertainty; fuzzy numbers; fuzzy arithmetic I. INTRODUCTION Land cover maps are essential to understand several geographical phenomenons, such as climate change, loss of biodiversity, land cover change and vegetation distribution. Even though the production of land cover maps is of great importance, the evaluation of their quality is also fundamental. If decisions are based on these maps, their better or worst quality will inevitably affect the quality of the decisions. The methodology traditionally applied to assess the accuracy of land cover maps requires the comparison of the produced maps with a reference sample database (Foody, 2002), that represents the 'true' land cover. The sample database is composed by the observations collected in several geographical sites, which are inspected with field visits and/or using high resolution satellite/aerial images. The comparison between the two datasets is represented in a confusion matrix, where generally the reference data and the map data are inserted in the columns and rows, respectively. This approach assumes that only one land cover class exists at each geographical site. However, more than one land cover class may exist in several locations, and the technician has often difficulties in choosing the most adequate class. This uncertainty arises because: 1) land cover classes rarely presents abrupt transitions between them; 2) landscape fragmentation; or 3) the natural continuum between land cover types, which makes difficult the process of discriminating between the land cover classes defined in the nomenclature. Several authors use fuzzy set theory to deal with this uncertainty (e.g., Woodcock & Gopal, 2000; Lein, 2003). Gopal and Woodcock (1994) developed one of the pioneer’s studies that introduced fuzzy sets in reference databases, to assess the accuracy of crisp land cover maps. In their work, the photo-interpretation uncertainty in elaborating a reference database was introduced using a rating system considering a linguistic scale. This scale is based on the premise that experts most often use linguistic constructs to describe map accuracy (Woodcock & Gopal, 2000). The linguistic scale used by the authors is composed by five levels: 1) Absolutely wrong; 2) Understandable but wrong; 3) Reasonable or acceptable answer; 4) Good answer; 5) Absolutely right. At each sample observation, the technician uses this linguistic scale to express their perception when identifying reference land cover classes. Using this linguistic scale, the authors developed several methods based on fuzzy functions that provide more information about the accuracy of a map than a confusion matrix. These functions are presented in the form of four tables, that provide information about the frequency of errors (MAX and RIGHT operators); magnitude of errors (DIFFERENCE operator); source of errors (MEMBERSHIP operator) and nature of errors (CONFUSION and AMBIGUITY operators). Gopal and Woodcock (1994) refer the need to develop a method to express the results in a single hard accuracy measure, Research by Pedro Sarmento was funded by the 'Fundação para a Ciência e Tecnologia' (SFRH/BD/61900/2009).