BIT 0006-3835/01/4105-0936 $16.00 2001, Vol. 41, No. 5, pp. 936–949 c Swets & Zeitlinger A FASTER AND SIMPLER RECURSIVE ALGORITHM FOR THE LAPACK ROUTINE DGELS * ERIK ELMROTH 1 and FRED G. GUSTAVSON 2 † 1 Department of Computing Science and HPC2N, Ume˚ a University, SE–901 87 Ume˚ a, Sweden. email: elmroth@cs.umu.se 2 IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA. email: gustav@watson.ibm.com Abstract. We present new algorithms for computing the linear least squares solution to overdeter- mined linear systems and the minimum norm solution to underdetermined linear systems. For both problems, we consider the standard formulation min ‖AX - B‖F and the trans- posed formulation min ‖A T X - B‖F , i.e, four different problems in all. The functionality of our implementation corresponds to that of the LAPACK routine DGELS. The new implementation is significantly faster and simpler. It outperforms the LAPACK DGELS for all matrix sizes tested. The improvement is usually 50–100% and it is as high as 400%. The four different problems of DGELS are essentially reduced to two, by use of explicit transposition of A. By explicit transposition we avoid computing Householder transformations on vectors with large stride. The QR factorization of block columns of A is performed using a recursive level-3 algorithm. By interleaving updates of B with the factorization of A, we reduce the number of floating point operations performed for the linear least squares problem. By avoiding redundant computations in the update of B we reduce the work needed to compute the minimum norm solution. Finally, we outline fully recursive algorithms for the four problems of DGELS as well as for QR factorization. AMS subject classification: 65F20, 65Y20. Key words: Overdetermined systems, underdetermined systems, linear least squares, minimum norm solution, recursion, high performance library software. 1 Introduction. The LAPACK algorithm DGELS [2] solves overdetermined linear systems in the least squares sense and it computes the minimum norm solution to underdeter- mined systems. We present a new simpler and significantly faster algorithm to perform these computations. Given an m × n matrix A with full row or column rank, and matrices X and B, both with nrhs columns and appropriate row dimensions, we consider the four problems: * Received September 2000. Communicated byGustaf S¨oderlind. † This research was conducted using the resources of High Performance Computing Center North (HPC2N). The work is part of an HPC2N–IBM collaboration (SUR grant). Financial support has been provided by the Swedish Research Council for Engineering Sciences under grant TFR 98-604.