Fast Balanced Stochastic Truncation Via A Quadratic
Extension of the Alternating Direction Implicit Iteration
Ngai Wong
Department of Electrical and Electronic Engineering
The University of Hong Kong
Pokfulam Road, Hong Kong
Email: nwong@eee.hku.hk
Venkataramanan Balakrishnan
School of Electrical and Computer Engineering
Purdue University
West Lafayette, IN, USA
Email: ragu@ecn.purdue.edu
Abstract— Balanced truncation (BT) model order reduction (MOR) is
known for its superior accuracy and computable error bounds. Balanced
stochastic truncation (BST) is a particular BT procedure that provides
a general, structure-independent MOR framework to preserve both
passivity and stability of original models. Its application toward large
scale systems, however, has been limited by the complexity of solving
large size continuous time algebraic Riccati equations (CAREs). This
paper introduces a novel quadratic extension of the alternating direction
implicit (ADI) iteration, called QADI, that efficiently solves a CARE. A
Cholesky factor variant of QADI, called CFQADI, further exploits low
rank matrices and and produces solution in factor form that greatly
accelerates BST. Remarkable efficiency of the proposed BST/(CF)QADI
integration is demonstrated with numerical examples.
I. I NTRODUCTION
Simulation of interconnect models after parasitic extraction, despite
being a big time sink, is a critical step to ensure functionality of a
chip. The importance of this post-layout verification is ever-increasing
as wire thinning and crosstalk have pushed the interconnect delay
beyond gate delay. A major difficulty in this modeling exercise is
the huge data volume and model sizes that forbid direct computer
handling. Model order reduction (MOR) has subsequently become
an integral operation in which high order models are considerably
reduced to smaller ones without much sacrifice in accuracy [1].Gener-
ally, there are two classes of MOR algorithms, namely, the traditional
projection-based algorithms [2]–[4] and the recently promoted bal-
anced truncation (BT) schemes [5]–[14]. Starting with a state space
formulation of the initial model, the former approach projects the
original system onto low rank subspaces that capture most state
activities. The resultant reduced order models often show similar
responses to the original systems, but there is neither direct error
connection between the two nor optimality in the final models.
Representative schemes in this class are PRIMA [2], Pad´ e approx-
imation via Lanczos (PVL) [3], and pole analysis via congruence
transformations (PACT) [4]. PRIMA and PACT preserve passivity
1
, if
any, of the original systems. However, both algorithms assume special
structures in the internal (passive) state space and these restrictions
are not always feasible or practical [8].
Balanced truncation (BT) represents a distinct paradigm arising
from rich theories in control. Instead of relying on projection, the
“balancing” step aligns the states in descending relevance based
on their involvement in input/output activities or energy transfer.
The “truncation” step then throws away the least important states,
resulting in a usually much smaller, (near-)optimal model of supe-
rior accuracy [8]. A bonus is that such schemes often come with
a deterministic bound for the approximation error (e.g, [5], [7]).
1
Passivity means a system is dissipative. Non-passive reduced order models
(of passive original systems) can result in erroneous global simulation [1], [2].
Algorithms among this class are optimal Hankel-norm approxima-
tion, standard BT, and the passivity-preserving balanced stochastic
truncation (BST) [5]–[8]. An obstacle in these BT realizations is
the expensive solution of large size matrix equations followed by
large size matrix factorizations. For example, standard BT calls for
the solution of a pair of dual Lyapunov equations (linear matrix
equations), while BST comes with a pair of dual continuous time
algebraic Riccati equations (CAREs) (quadratic matrix equations)
whose solution can be computationally prohibitive [12]. To lower
the cost of standard BT, recent results took advantage of the low
rank input/output matrices pertinent to interconnect models and came
up with the Cholesky factor (CF) standard BT variants of speed
comparable to the projection-based methods [9]–[11], [14]. Standard
BT, however, does not necessarily preserve passivity. BSTguarantees
stability and passivity and poses no special structural requirements
on the internal state space [8], but is highly restricted by large size
CAREs. Conventional techniques of solving a CARE include forming
a Hamiltonian matrix and identifying its stable invariant subspace, or
by Newton method that solves a Lyapunov equation in each iteration
step [12], [15], [16]. The former way does not explicitly exploit low
rank/sparse matrices and is generally slow, while the latter requires
an appropriate initial condition which may not be readily available.
The contribution of this paper is the formulation of a quadratic
alternating direction implicit (ADI) algorithm [17], called QADI, that
efficiently solves a (large size) CARE. A CF variant of QADI, called
CFQADI, further exploits low rank matrices and produces factor-
form solution that accelerates BST. The BST/(CF)QADI approach
constitutes a general framework for high speed, large scale, passivity-
preserving MOR. This paper is organized as follows. Section II
revises the BST settings and the ADI method for solving Lyapunov
equations. Section III presents QADI, proves its convergence, and
derives CFQADI which makes use of low rank matrices to provide
speedup and memory savings. Remarkable efficiency of adapting
(CF)QADI for BST realization is seen from the numerical examples
in Section IV. Finally, Section VI draws the conclusion.
II. PRELIMINARIES
A. Basics of BST
Modeling of VLSI interconnects and pin packages generally in-
volve passive RLC components. We consider a large scale RLC
circuit represented in a state space format
˙ x = A0x + B0u (1a)
y = C0x + D0u (1b)
where A0 ∈ R
n×n
, B0 ∈ R
n×m
, C0 ∈ R
m×n
, D0 ∈ R
m×m
, B0,
C0 are generally of low ranks (i.e., m ≪ n) and u, y are power-
0-7803-9254-X/05/$20.00 ©2005 IEEE. 801