Fast Balanced Stochastic Truncation Via A Quadratic Extension of the Alternating Direction Implicit Iteration Ngai Wong Department of Electrical and Electronic Engineering The University of Hong Kong Pokfulam Road, Hong Kong Email: nwong@eee.hku.hk Venkataramanan Balakrishnan School of Electrical and Computer Engineering Purdue University West Lafayette, IN, USA Email: ragu@ecn.purdue.edu Abstract— Balanced truncation (BT) model order reduction (MOR) is known for its superior accuracy and computable error bounds. Balanced stochastic truncation (BST) is a particular BT procedure that provides a general, structure-independent MOR framework to preserve both passivity and stability of original models. Its application toward large scale systems, however, has been limited by the complexity of solving large size continuous time algebraic Riccati equations (CAREs). This paper introduces a novel quadratic extension of the alternating direction implicit (ADI) iteration, called QADI, that efciently solves a CARE. A Cholesky factor variant of QADI, called CFQADI, further exploits low rank matrices and and produces solution in factor form that greatly accelerates BST. Remarkable efciency of the proposed BST/(CF)QADI integration is demonstrated with numerical examples. I. I NTRODUCTION Simulation of interconnect models after parasitic extraction, despite being a big time sink, is a critical step to ensure functionality of a chip. The importance of this post-layout verication is ever-increasing as wire thinning and crosstalk have pushed the interconnect delay beyond gate delay. A major difculty in this modeling exercise is the huge data volume and model sizes that forbid direct computer handling. Model order reduction (MOR) has subsequently become an integral operation in which high order models are considerably reduced to smaller ones without much sacrice in accuracy [1].Gener- ally, there are two classes of MOR algorithms, namely, the traditional projection-based algorithms [2]–[4] and the recently promoted bal- anced truncation (BT) schemes [5]–[14]. Starting with a state space formulation of the initial model, the former approach projects the original system onto low rank subspaces that capture most state activities. The resultant reduced order models often show similar responses to the original systems, but there is neither direct error connection between the two nor optimality in the nal models. Representative schemes in this class are PRIMA [2], Pad´ e approx- imation via Lanczos (PVL) [3], and pole analysis via congruence transformations (PACT) [4]. PRIMA and PACT preserve passivity 1 , if any, of the original systems. However, both algorithms assume special structures in the internal (passive) state space and these restrictions are not always feasible or practical [8]. Balanced truncation (BT) represents a distinct paradigm arising from rich theories in control. Instead of relying on projection, the “balancing” step aligns the states in descending relevance based on their involvement in input/output activities or energy transfer. The “truncation” step then throws away the least important states, resulting in a usually much smaller, (near-)optimal model of supe- rior accuracy [8]. A bonus is that such schemes often come with a deterministic bound for the approximation error (e.g, [5], [7]). 1 Passivity means a system is dissipative. Non-passive reduced order models (of passive original systems) can result in erroneous global simulation [1], [2]. Algorithms among this class are optimal Hankel-norm approxima- tion, standard BT, and the passivity-preserving balanced stochastic truncation (BST) [5]–[8]. An obstacle in these BT realizations is the expensive solution of large size matrix equations followed by large size matrix factorizations. For example, standard BT calls for the solution of a pair of dual Lyapunov equations (linear matrix equations), while BST comes with a pair of dual continuous time algebraic Riccati equations (CAREs) (quadratic matrix equations) whose solution can be computationally prohibitive [12]. To lower the cost of standard BT, recent results took advantage of the low rank input/output matrices pertinent to interconnect models and came up with the Cholesky factor (CF) standard BT variants of speed comparable to the projection-based methods [9]–[11], [14]. Standard BT, however, does not necessarily preserve passivity. BSTguarantees stability and passivity and poses no special structural requirements on the internal state space [8], but is highly restricted by large size CAREs. Conventional techniques of solving a CARE include forming a Hamiltonian matrix and identifying its stable invariant subspace, or by Newton method that solves a Lyapunov equation in each iteration step [12], [15], [16]. The former way does not explicitly exploit low rank/sparse matrices and is generally slow, while the latter requires an appropriate initial condition which may not be readily available. The contribution of this paper is the formulation of a quadratic alternating direction implicit (ADI) algorithm [17], called QADI, that efciently solves a (large size) CARE. A CF variant of QADI, called CFQADI, further exploits low rank matrices and produces factor- form solution that accelerates BST. The BST/(CF)QADI approach constitutes a general framework for high speed, large scale, passivity- preserving MOR. This paper is organized as follows. Section II revises the BST settings and the ADI method for solving Lyapunov equations. Section III presents QADI, proves its convergence, and derives CFQADI which makes use of low rank matrices to provide speedup and memory savings. Remarkable efciency of adapting (CF)QADI for BST realization is seen from the numerical examples in Section IV. Finally, Section VI draws the conclusion. II. PRELIMINARIES A. Basics of BST Modeling of VLSI interconnects and pin packages generally in- volve passive RLC components. We consider a large scale RLC circuit represented in a state space format ˙ x = A0x + B0u (1a) y = C0x + D0u (1b) where A0 R n×n , B0 R n×m , C0 R m×n , D0 R m×m , B0, C0 are generally of low ranks (i.e., m n) and u, y are power- 0-7803-9254-X/05/$20.00 ©2005 IEEE. 801