Efficient Algorithm for ‘‘On-the-Fly’’ Error Analysis of Local or Distributed Serially Correlated Data DAVID R. KENT IV, 1 RICHARD P. MULLER, 1 * AMOS G. ANDERSON, 1 WILLIAM A. GODDARD III, 1 MICHAEL T. FELDMANN 2 1 Materials and Process Simulation Center, Division of Chemistry and Chemical Engineering, California Institute of Technology (MC 139-74), Pasadena, California 91125 2 Center for Advanced Computing Research, California Institute of Technology, Pasadena, California 91125 Received 22 January 2004; Accepted 6 March 2007 DOI 10.1002/jcc.20746 Published online 2 May 2007 in Wiley InterScience (www.interscience.wiley.com). Abstract: We describe the Dynamic Distributable Decorrelation Algorithm (DDDA) which efficiently calculates the true statistical error of an expectation value obtained from serially correlated data ‘‘on-the-fly,’’ as the calculation progresses. DDDA is an improvement on the Flyvbjerg-Petersen renormalization group blocking method (Flyvberg and Peterson, J Chem Phys 1989, 91, 461). This ‘‘on-the-fly’’ determination of statistical quantities allows dynamic termination of Monte Carlo calculations once a specified level of convergence is attained. This is highly desirable when the required precision might take days or months to compute, but cannot be accurately estimated prior to the calculation. Furthermore, DDDA allows for a parallel implementation which requires very low communication, O(log 2 N), and can also evaluate the variance of a calculation efficiently ‘‘on-the-fly.’’ Quantum Monte Carlo calcu- lations are presented to illustrate ‘‘on-the-fly’’ variance calculations for serial and massively parallel Monte Carlo calculations. q 2007 Wiley Periodicals, Inc. J Comput Chem 28: 2309–2316, 2007 Key words: Quantum Monte Carlo; serial correlation; parallel computing; variance statistic Introduction Monte Carlo methods are becoming increasingly important in calculating the properties of chemical, biological, materials, and financial systems. The underlying algorithms of such simulations (e.g. Metropolis algorithm 1 ) often involve Markov chains. The data generated from the Markov chains are serially correlated, meaning that the covariances between data elements is non-zero. Because of this, care must be taken to obtain the correct variances for observables calculated from the data. Data blocking algorithms to obtain the correct variance of serially correlated data have been part of the lore of the Monte Carlo community for years. Flyvbjerg and Petersen were the first to formally analyze the technique 2 , but at least, partial credit should be given to Wilson 3 , Whitmer 4 , and Gottlieb and co- workers 5 for their earlier contributions. We propose a new blocking algorithm, dynamic distributable decorrelation algorithm (DDDA), which gives the same results as the Flyvbjerg-Petersen algorithm but allows the underlying variance of the serially correlated data to be analyzed ‘‘on-the- fly’’ with negligible additional computational expense. DDDA is also ideally suited for parallel computations because only a small amount of data must be communicated between processors to obtain the global results. Furthermore, we present an efficient method for combining results from individual processors in a parallel calculation that allows fast ‘‘on-the-fly’’ result analysis for parallel calculations. Example calculations showing ‘‘on-the- fly’’ variance calculations for serial and massively parallel calcu- lations are also presented. All current blocking algorithms require O(mN) operations to evaluate the variance m times during a calculation of N steps. DDDA only requires O(N þ m log 2 N). Furthermore, current algorithms require communicating O(N) data during a parallel calculation to evaluate the variance. DDDA requires only O(log 2 N). The improved efficiency permits convergence based termination in a nearly ‘‘zero cost’’ manner for Monte Carlo calculations. Correspondence to: William A. Goddard III; e-mail: wag@wag.caltech.edu Contract/grant sponsor: Office of Scientific Computing and Office of Defense Programs; contract/grant number: DE-FGO2-97ER25308 and DOE-ASC-LLNL-B523297 Contract/grant sponsor: Fannie and John Hertz Foundation *Present address for R.P.M.: Multiscale Computational Materials Methods, Sandia National Laboratories, Albuquerque, New Mexico 87185-1322 q 2007 Wiley Periodicals, Inc.