Application of Approximate Matrix Multiplication to Neural Networks and Distributed SLAM Brian Plancher* Harvard University Cambridge, Massachusetts Lillian Pentecost* Harvard University Cambridge, Massachusetts Camelia D. Brumar* Worcester Polytechnic Institute Worcester, Massachusetts Saketh Rama* Harvard University Cambridge, Massachusetts Iulian Brumar* Harvard University Cambridge, Massachusetts David Brooks Harvard University Cambridge, Massachusetts Abstract—Computational efficiency is a critical constraint for a variety of cutting-edge real-time applications. In this work, we identify an opportunity to speed up the end-to-end runtime of two such compute bound applications by incorporating approximate linear algebra techniques. Particularly, we apply approximate matrix multiplication to artificial Neural Networks (NNs) for image classification and to the robotics problem of Distributed Simultaneous Localization and Mapping (DSLAM). Expanding upon recent sampling-based Monte Carlo approximation strate- gies for matrix multiplication, we develop updated theoretical bounds, and an adaptive error prediction strategy. We then apply these techniques in the context of NNs and DSLAM increasing the speed of both applications by 15-20% while maintaining a 97% classification accuracy for NNs running on the MNIST dataset and keeping the average robot position error under 1 meter (vs 0.32 meters for the exact solution). However, both applications experience variance in their results. This suggests that Monte Carlo matrix multiplication may be an effective technique to reduce the memory and computational burden of certain algorithms when used carefully, but more research is needed before these techniques can be widely used in practice. Index Terms—approximation, linear algebra, neural networks, robotics, SLAM I. I NTRODUCTION In the past few decades there has been a large body of work focused on accelerating exact linear algebra kernels in hardware, motivating a range of inventions from GPU streaming multiprocessors to systolic arrays [1]. Since 2014, the dramatic proliferation of machine learning methods, par- ticularly deep learning, has further increased the demand for efficient linear algebra operations while relaxing exactness requirements relative to traditional consumers of linear algebra in scientific computing [2]–[5]. Independently of trends in applied research, a surge of interest in approximation as a fundamental property of com- putational complexity has, in part, motivated a series of gen- eral approximation algorithms for fundamental linear algebra operations including matrix multiplication. One of the best known such proposal employs Monte Carlo sampling with replacement and offers asymptotic guarantees for the matrix norm of the resulting computation [6]–[8]. *All authors have contributed equally and are ordered alphabetically. We observe that both the application domains of deep learning and robotics demand linear algebra operations but also tolerate some error in their results. This work builds on this intuition by applying and extending Monte Carlo meth- ods for matrix multiplication to specific application domains and evaluating the resulting impact on end-to-end application speed and accuracy. Our applications of choice are neural net- works for image classification and Distributed Simultaneous Localization and Mapping (DSLAM) for robotics, both of which rely heavily on matrix multiplications. Focusing on matrix multiplication in particular, this work empirically evaluates the tightness of bounds in the algorithm for sampling with replacement, which is an attractive solution due to previously-verified theoretical bounds [6]. We explore the limitations of the existing algorithm, propose modifica- tions, and develop a corresponding error prediction model using computed error bounds. This version of Monte Carlo matrix multiplication is then applied in the context of our target applications in order to evaluate the practicality and potential performance improvements of theoretical results for interesting end-to-end, compute-constrained problems. II. RELATED WORK Simplified approaches to linear algebra can have a signifi- cant impact on computational overhead and memory require- ments for critical applications, and have been studied since the inception of factor models in psychology [9], [10]. More recently, a series of theoretical works have set compelling bounds on Monte Carlo algorithms for a set of linear algebra operations, including matrix multiplication, low-rank approxi- mation, and matrix decomposition [6]–[8]. Similar randomized algorithms have been proposed for low-rank matrix factoriza- tions, such as singular value and QR decomposition [11]. Many alternate approaches to approximate matrix multipli- cation exist. For example, one such proposal leverages Fast Fourier Transforms (FFTs) and treats matrix multiplication as a low-rank polynomial multiplication [12]. Another approach is based on random projections [13]. Finally, another family of approaches satisfies additional constraints, such as retaining a subset of columns unchanged in the approximation, or by