Data-Driven Predictive Modeling of Mineral Prospectivity Using Random Forests: A Case Study in Catanduanes Island (Philippines) Emmanuel John M. Carranza 1,3 and Alice G. Laborte 2 Received 23 March 2015; accepted 20 April 2015 The Random Forests (RF) algorithm is a machine learning method that has recently been demonstrated as a viable technique for data-driven predictive modeling of mineral prospectivity, and thus, it is instructive to further examine its usefulness in this particular field. A case study was carried out using data from Catanduanes Island (Philippines) to investigate further (a) if RF modeling can be used for data-driven modeling of mineral prospectivity in areas with few (i.e., < 20) mineral occurrences and (b) if RF modeling can handle predictor variables with missing values. We found that RF modeling outperforms evidential belief (EB) modeling of prospectivity for hydrothermal Au–Cu deposits in Catanduanes Island, where 17 hydrothermal Au–Cu prospects are known to exist. Moreover, just like EB modeling, RF modeling allows analysis of the spatial relationships between known prospects and individual layers of predictor data. Furthermore, RF modeling can handle missing values in predictor data through an RF-based imputation technique whereas in EB modeling, missing values are simply represented by maximum uncertainty. Therefore, the RF algorithm is a potentially useful method for data-driven predictive modeling of mineral prospectivity in regions with few (i.e., < 20) occurrences of mineral deposits of the type sought. However, further testing of the method in other regions with few mineral occurrences is warranted to fully determine its usefulness in data-driven predictive modeling of mineral prospectivity. KEY WORDS: Regression trees, Missing data, Hydrothermal Au–Cu deposits, Catanduanes (Philip- pines), GIS. INTRODUCTION Data-driven predictive modeling of mineral prospectivity involves the analysis of the spatial re- lationship between various predictor variables rep- resented by layers of perceived spatial evidence of mineral occurrences and a target variable repre- sented by a layer of known mineral deposit occur- rences (Bonham-Carter 1994; Pan and Harris 2000; Carranza 2008). It is appropriate in moderately to well-explored (or so-called brownfields) regions, where the goal is to delineate new targets for further detailed exploration. A model of the spatial rela- tionship between the predictors and the target variable is used to define weights to be assigned to every evidence layer for the prediction of prospec- tive areas. The various methods that have been most applied (for data-driven modeling of such spatial relationship in a geographic information system or 1 Department of Earth and Oceans, James Cook University, Townsville QLD 4811, Australia. 2 International Rice Research Institute, Los Banos Laguna, Philippines. 3 To whom correspondence should be addressed; e-mail: john.carranza@jcu.edu.au Ó 2015 International Association for Mathematical Geosciences Natural Resources Research (Ó 2015) DOI: 10.1007/s11053-015-9268-x