Parallel Computing 81 (2019) 22–31
Contents lists available at ScienceDirect
Parallel Computing
journal homepage: www.elsevier.com/locate/parco
Dynamic look-ahead in the reduction to band form for the
singular value decomposition
Andrés E. Tomás
a
, Rafael Rodríguez-Sánchez
b,∗
, Sandra Catalán
a
,
Rocío Carratalá-Sáez
a
, Enrique S. Quintana-Ortí
a
a
Dept. Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain
b
Dep. Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, Spain
a r t i c l e i n f o
Article history:
Received 24 May 2018
Revised 11 September 2018
Accepted 28 November 2018
Available online 29 November 2018
Keywords:
Singular value decomposition
Two-stage reduction
Look-ahead
Runtime
Dynamic scheduling
Multicore processors
a b s t r a c t
We investigate the introduction of look-ahead in two-stage algorithms for the singular
value decomposition (SVD). Our approach relies on a specialized reduction for the first
stage that produces a band matrix with the same upper and lower bandwidth instead of
the conventional upper triangular-band matrix. In the case of a CPU-GPU server, this alter-
native form accommodates a static look-ahead into the algorithm in order to overlap the
reduction of the “next” panel on the CPU and the “current” trailing update on the GPU. For
multicore processors, we leverage the same compact form to formulate a version of the al-
gorithm that advances the reduction of “future” panels, yielding a dynamic look-ahead that
overcomes the performance bottleneck that the sequential panel factorization represents.
© 2018 Elsevier B.V. All rights reserved.
1. Introduction
The Singular Value Decomposition (SVD) is a handy numerical tool for the computation of the matrix rank, the cal-
culation of low-rank approximations to a matrix, and the solution of ill-conditioned least squares problems. These linear
algebra kernels arise in medicine, geosciences, material sciences, crystallography, security, and information retrieval, among
others [1–3].
Given a dense matrix A ∈ R
m×n
, the standard algorithm to compute its SVD first obtains a reduced bidiagonal matrix
B ∈ R
m×n
, using a sequence of Householder reflectors that are applied from the left- and right-hand sides of A as:
A = U BV
T
, (1)
where U ∈ R
m×m
, V ∈ R
n×n
are both orthogonal and B ∈ R
m×n
is upper bidiagonal [4]. (Without loss of generality, hereafter
we will assume that m ≥ n. Otherwise, we can simply compute the SVD of A
T
.) The cost of this direct two-sided reduction
(TSR) algorithm is 4n
2
(m − n/3) floating-point operations (flops). The singular values of A are then computed from the
bidiagonal matrix B, via the QD algorithm or some divide-and-conquer variant, adding a minor cost to the flop count (unless
the singular vectors are also required) [4].
An algorithm for the direct reduction to bidiagonal form A → B, via Householder reflectors, will necessarily perform a
significant amount of flops in terms of the Level-2 BLAS (basic linear algebra subprograms) [6]. As a consequence, this
∗
Corresponding author.
E-mail addresses: tomasan@uji.es (A.E. Tomás), rafaelrs@ucm.es (R. Rodríguez-Sánchez), catalans@uji.es (S. Catalán), rcarrata@uji.es (R. Carratalá-Sáez),
quintana@uji.es (E.S. Quintana-Ortí).
https://doi.org/10.1016/j.parco.2018.11.001
0167-8191/© 2018 Elsevier B.V. All rights reserved.