PROC. OF THE 9th PYTHON IN SCIENCE CONF. (SCIPY 2010) 1 Theano: A CPU and GPU Math Compiler in Python James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, Yoshua Bengio AbstractTheano is a compiler for mathematical expressions in Python that combines the convenience of NumPy’s syntax with the speed of optimized native machine language. The user composes mathematical expressions in a high-level description that mimics NumPy’s syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide symbolic differentiation. Before performing computation, Theano optimizes the choice of expressions, translates them into C++ (or CUDA for GPU), compiles them into dynamically loaded Python modules, all automatically. Common machine learn- ing algorithms implemented with Theano are from 1.6× to 7.5× faster than competitive alternatives (including those implemented with C/C++, NumPy/SciPy and MATLAB) when compiled for the CPU and between 6.5× and 44× faster when compiled for the GPU. This paper illustrates how to use Theano, outlines the scope of the compiler, provides benchmarks on both CPU and GPU processors, and explains its overall design. Introduction Python is a powerful and flexible language for describing large-scale mathematical calculations, but the Python inter- preter is in many cases a poor engine for executing them. One reason is that Python uses full-fledged Python objects on the heap to represent simple numeric scalars. To reduce the overhead in numeric calculations, it is important to use array types such as NumPy’s ndarray so that single Python objects on the heap can stand for multidimensional arrays of numeric scalars, each stored efficiently in the host processor’s native format. [NumPy] provides an N-dimensional array data type, and many functions for indexing, reshaping, and performing ele- mentary computations (exp, log, sin, etc.) on entire arrays at once. These functions are implemented in C for use within Python programs. However, the composition of many such NumPy functions can be unnecessarily slow when each call is dominated by the cost of transferring memory rather than the cost of performing calculations [Alted]. [numexpr] goes one step further by providing a loop fusion optimization that can glue several element-wise computations together. Unfor- tunately, numexpr requires an unusual syntax (the expression must be encoded as a string within the code), and at the time of this writing, numexpr is limited to optimizing element-wise computations. [Cython] and [scipy.weave] address Python’s performance issue by offering a simple way to hand-write crucial segments of code in C (or a dialect of Python which can be easily compiled to C, in Cython’s case). While this approach can yield significant speed gains, it is labor-intensive: if the bottleneck of a program is a large mathematical expres- sion comprising hundreds of elementary operations, manual The corresponding author is with Université de Montréal, e-mail: james.bergstra@umontreal.ca. program optimization can be time-consuming and error-prone, making an automated approach to performance optimization highly desirable. Theano, on the other hand, works on a symbolic represen- tation of mathematical expressions, provided by the user in a NumPy-like syntax. Access to the full computational graph of an expression opens the door to advanced features such as symbolic differentiation of complex expressions, but more importantly allows Theano to perform local graph transforma- tions that can correct many unnecessary, slow or numerically unstable expression patterns. Once optimized, the same graph can be used to generate CPU as well as GPU implementations (the latter using CUDA) without requiring changes to user code. Theano is similar to [SymPy], in that both libraries ma- nipulate symbolic mathematical graphs, but the two projects have a distinctly different focus. While SymPy implements a richer set of mathematical operations of the kind expected in a modern computer algebra system, Theano focuses on fast, efficient evaluation of primarily array-valued expressions. Theano is free open source software, licensed under the New (3-clause) BSD license. It depends upon NumPy, and can optionally use SciPy. Theano includes many custom C and CUDA code generators which are able to specialize for particular types, sizes, and shapes of inputs; leveraging these code generators requires gcc (CPU) and nvcc (GPU) compilers, respectively. Theano can be extended with custom graph expressions, which can leverage scipy.weave, Py- CUDA, Cython, and other numerical libraries and compilation technologies at the user’s discretion. Theano has been actively and continuously developed and used since January 2008. It has been used in the preparation of numerous scientific papers and as a teaching platform for machine learning in graduate courses at l’Université de Montréal. Documentation and installation instructions can be found on Theano’s website [theano]. All Theano users should subscribe to the announce 1 mailing list (low traffic). There are medium traffic mailing lists for developer discussion 2 and user support 3 . This paper is divided as follows: Case Study: Logistic Regression shows how Theano can be used to solve a sim- ple problem in statistical prediction. Benchmarking Results presents some results of performance benchmarking on prob- lems related to machine learning and expression evaluation. What’s in Theano gives an overview of the design of Theano. Limitations and Future Work outlines current limitations of our implementation and currently planned additions to Theano. 1 http://groups.google.com/group/theano-announce 2 http://groups.google.com/group/theano-dev 3 http://groups.google.com/group/theano-users