Author's personal copy J. Parallel Distrib. Comput. 70 (2010) 270–281 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc A parallel direct solver for the self-adaptive hp Finite Element Method Maciej Paszyński a, , David Pardo b , Carlos Torres-Verdín c , Leszek Demkowicz d , Victor Calo e a Department of Computer Science, AGH University of Science and Technology, Cracow, Poland b IKERBASQUE (Basque Foundation for Sciences) and BCAM (Basque Center for Applied Mathematics) Bilbao, Spain c Department of Petroleum and Geosystems Engineering, The University of Texas in Austin, USA d Institute for Computational Engineering and Sciences, The University of Texas in Austin, USA e Earth Science and Engineering, Applied Mathematics and Computational Science, King Abdullah University of Science and Technology (KAUST), Saudi Arabia article info Article history: Received 18 October 2008 Received in revised form 20 May 2009 Accepted 21 September 2009 Available online 25 September 2009 Keywords: Parallel direct solvers Finite Element Method hp adaptivity 3D borehole resistivity abstract In this paper we present a new parallel multi-frontal direct solver, dedicated for the hp Finite Element Method (hp-FEM). The self-adaptive hp-FEM generates in a fully automatic mode, a sequence of hp- meshes delivering exponential convergence of the error with respect to the number of degrees of freedom (d.o.f.) as well as the CPU time, by performing a sequence of hp refinements starting from an arbitrary initial mesh. The solver constructs an initial elimination tree for an arbitrary initial mesh, and expands the elimination tree each time the mesh is refined. This allows us to keep track of the order of elimination for the solver. The solver also minimizes the memory usage, by de-allocating partial LU factorizations computed during the elimination stage of the solver, and recomputes them for the backward substitution stage, by utilizing only about 10% of the computational time necessary for the original computations. The solver has been tested on 3D Direct Current (DC) borehole resistivity measurement simulations problems. We measure the execution time and memory usage of the solver over a large regular mesh with 1.5 million degrees of freedom as well as on the highly non-regular mesh, generated by the self-adaptive hp-FEM, with finite elements of various sizes and polynomial orders of approximation varying from p = 1 to p = 9. From the presented experiments it follows that the parallel solver scales well up to the maximum number of utilized processors. The limit for the solver scalability is the maximum sequential part of the algorithm: the computations of the partial LU factorizations over the longest path, coming from the root of the elimination tree down to the deepest leaf. © 2009 Elsevier Inc. All rights reserved. 1. Introduction The paper presents a new parallel direct solver designed for the hp Finite Element Method (FEM) [4]. Sequential and parallel 2D and 3D hp adaptive FE codes [5,20,19,18] generate automatically a sequence of optimal FE meshes providing exponential convergence of the numerical error of the solution with respect to the CPU time and the mesh size, expressed in terms of the number of degrees of freedom (d.o.f.). A sequence of meshes is automatically generated by the computer by performing h or p refinements. The h refinement consists of breaking a finite element generating several new smaller elements; p refinement consists of increasing the polynomial order of approximation over some finite element edges, faces, and interiors. As we refine the grid, the number of finite elements increases and the polynomial orders of approximation associated to each edge and interior change. Corresponding author. E-mail address: paszynsk@agh.edu.pl (M. Paszyński). The fully automatic hp adaptive algorithm [5] delivers a sequence of optimal hp-meshes that enables an accurate approx- imation of challenging engineering problems. However, the com- putational cost needed to solve the problem of interest over this sequence of meshes is large. Thus, there is a need to utilize effi- cient parallel direct solvers. Before describing the idea of our new solver, we begin with a short introduction of existing direct solution methods. Single processor direct solvers for FE computations are typically frontal solvers or multi-frontal solvers. The frontal solver [10,6] browses finite elements, one-by-one, to aggregate d.o.f. Fully assembled d.o.f. are eliminated from the single front matrix. The multi-frontal solver [8,7] constructs the assembly tree based on the analysis of the connectivity data or the geometry of the computational mesh. Finite elements are joined into pairs and fully assembled d.o.f. are eliminated within frontal matrices associated to the multiple branches of the tree. The process is repeated until the root of the assembly tree is reached. Finally, the common interface problem is solved and partial backward substitutions are recursively called on the assembly tree. 0743-7315/$ – see front matter © 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2009.09.007