A primer on understanding Google Earth Engine APIs Rui S. Reis ab , Nuno Datia ab , M. P. M. Pato bc a ISEL - Instituto Superior de Engenharia de Lisboa, Instituto Polit´ ecnico de Lisboa b NovaLincs, FCT – Universidade Nova de Lisboa c Instituto de Biof´ ısica e Engenharia Biom´ edica, FC-UL ruisreis@hotmail.com {datia,mpato}@deetc.isel.pt Abstract— This article introduces the rationale behind the usage of the Google Earth Engine, and the advantages it offers, as an alternative to handle large volumes of georeferenced data using the existing tools we know as Geographic Information Systems on premises. Google Earth Engine is an efficient development framework that presents itself in two basic flavors: one online integrated development environment which uses the browser JavaScript’s engine; and two APIs that can be deployed on either a Python or a NodeJS environment. After presenting a limited number of use cases, representative of the Google Earth Engine design patterns, and building a prototype class using both variants, we conclude that both platforms are merely proxy APIs to the Google Earth Engine and do not have any measurable performance difference. However, since they run on fundamentally diverse contexts — a JavaScript’s engine on an internet browser, that integrates seamlessly with Google Maps, and a Python environment — it is argued that their utility depends on the user requirements instead of being true alternatives. Keywords: Google Earth Engine, Javascript, Python, Code Editor, Georeferenced Data, Multi-spectral Data. I. I NTRODUCTION Google Earth Engine [1] (GEE) is, primarily, a distributed parallel computing platform. It is designed around a functional language pattern, even though supported on an object model, and a map - reduce [2] distributed workload paradigm. Leveraging the sheer computing power delivered by the Google infrastructure and a multi petabyte georeferenced data repository, GEE is an efficient development framework to handle all tasks related with selecting, computing calculations and displaying georeferenced data. A. Georeferenced data Working with georeferenced data is a grueling task consid- ering the large volumes of information, the complexity and diversity of storage formats. Using, as an example, remote sensing multi-spectral data gathered by instruments on board of satellites, it is easy to understand the complexity of obtaining, interpreting and making calculations using this kind of data: Multi-spectral data is arranged in bands that store the reflectance measurements on a range of wavelengths. For instance, the Level-1C instrument’s aboard the Sentinel 2 [3] constellation read 13 reflectance bands and 3 additional data quality bands. For the Sentinel 2 each coordinate, a pair longitude and latitude values, represents a 10m 2 area and is associated to a vector of 16 values, one for each of the reflectance and quality bands; This data is organized using a set of rules that are, gen- erally, specific to each satellite. Seldom an interpretation layer must be used to transform the source format to one of the standard (or “de facto”) file formats, so that it can be used by one of the existing libraries (e.g. GDAL [4]); The calculations require a lot of resources for storing and processing. Until recently, many researchers opted almost exclusively to use calculated products, which are datasets with multi-spectral calculated data, mostly in the form of indices, like NDVI [5]. These were published by organizations (profitable or not) like Copernicus 1 or VITO 2 , and the availability of these datasets is delayed in time, considering the actual date of retrieval; The usage of calculated products might reduce the com- plexity of the data, for instance a NDVI dataset has a single value for each coordinate, but the information volume is still very large. If we take a single day of data for a 3.245km 2 area in Portugal of NDVI gathered by the Proba-V [6] instruments, where each coordinate represents a 300m 2 area, we will get an approximately 1Gb [7] GeoTIFF [8] file. However, besides the storage requirements, to make addi- tional calculations on this data, an adequate tool must be used. Consider using, for example, the GDAL [4] library embedded in an integrated Geographical Information System (GIS) tool, QGIS [9]. Using this on premises 3 setup, and the NDVI dataset described previously, the calculation of the arithmetic average of the NDVI value, on every coordinate, for two approximate areas of 300km 2 and 170km 2 , took close to 5 minutes (using a computer with an Intel Core i5-6200U, 8Gb RAM and a 256Gb SSD) [7]. The multi-spectral data is very sensitive to the presence of clouds and atmospheric aerosols. This means that multi- spectral data is potentially sparse due to the varying weather conditions and pollution. Methods like Maximum Value Com- posite [10], that require handling several of the previously described datasets, in order to obtain significant NDVI values, will require even larger amounts of storage and computing resources [7]. 1 https://www.copernicus.eu/en 2 https://vito.be/en 3 The software is installed and runs on computers on the premises of the person or organization using the software, rather than at a remote facility. i-ETC: ISEL Academic Journal of Electronics, Telecommunications and Computers Vol. 6 , n. 1 (2020) ID-4 http://journals.isel.pt