DRAFT Learning Geographical Manifolds: A Kernel Trick for Geographical Machine Learning * Levi John Wolf 1,2 and Elijah Knaap 2 1 School of Geographical Sciences, University of Bristol, levi.john.wolf@bristol.ac.uk 2 Center for Geospatial Sciences, University of California, Riverside May 15, 2019 Abstract Dimension reduction is one of the oldest concerns in geographical analysis. Despite signifi- cant, longstanding attention in geographical problems, recent advances in non-linear techniques for dimension reduction, called manifold learning, have not been adopted in classic data-intensive geographical problems. More generally, machine learning methods for geographical problems often focus more on applying standard machine learning algorithms to geographic data, rather than apply- ing true “spatially-correlated learning,” in the words of Kohonen. As such, we suggest a general way to incentivize geographical learning in machine learning algorithms, and link it to many past meth- ods that introduced geography into statistical techniques. We develop a specific instance of this by specifying two geographical variants of Isomap, a non-linear dimension reduction, or “manifold learning,” technique. We also provide a method for assessing what is added by incorporating geog- raphy and estimate the manifold’s intrinsic geographic scale. To illustrate the concepts and provide interpretable results, we conducting a dimension reduction on geographical and high-dimensional structure of social and economic data on Brooklyn, New York. Overall, this paper’s main endeavor– defining and explaining a way to “geographize” many machine learning methods–yields interesting and novel results for manifold learning the estimation of intrinsic geographical scale in unsupervised learning. 1 Introduction: Manifold Learning in Geography In the current era of Big Data, cloud computing, and adversarial artificial intelligence, quantitative re- search in nearly every field is turning to computational methods and machine learning techniques to help model and interpret information in its respective domain. This is equally true of geography, where machine learning models are applied ever more to multivariate problem domains contextualized by geographic space. Neighborhood analysis is a subfield of geography where this trend has been partic- ularly visible (Arribas-Bel 2014; O’Brien et al. 2015; Delmelle 2016; Knaap 2017; Poorthuis and Zook * This material is based on work supported by the National Science Foundation under Grant #1762160 1