Abstract—Simulation of multiphysics phenomena is of great importance for sciences and engineering. It involves solutions of large sparse systems of linear equations. Solving such systems with direct methods assures good performance but require large amounts of memory. Out-of-core algorithms utilize secondary storage to overcome the increased memory needs. The outstanding efficiency of solid state non-volatile motivated us to study the performance of out-of-core, sparse, linear system solvers in SSDs. We evaluated three direct solvers with out-of-core functionality (IBM WSMP, INTEL MKL PARDISO and MUMPS) both on SSDs and HDD. Experimental results show that WSMP and PARDISO benefit the most while performance gain for MUMPS exists but it is less. Index Terms— solid state NVM, SSD, multiphysics, scientific calculations, scientific computing, sparse matrixes, direct solvers. I. INTRODUCTION Efficient solution of systems of linear equations with large sparse matrices is fundamental in the computational sciences since studying real world problems that involve such computations. Multiphysics and multidomain simulation software (e.g. COMSOL, ANSYS, Code_Aster and FEniCS) utilize either direct or iterative methods to conduct the solution of the linear systems arising when solving a multi-physics problem. Sparse linear systems are also found in cell placement optimization algorithms in Electronic Design Automation (EDA) industry [1]. The direct methods are more robust than the iterative ones, but require large amounts of memory as the size of the problem grows. Some direct solvers use out-of-core (OOC) algorithms to overcome the increased main memory needs. These algorithms are designed to efficiently fetch and access data stored in secondary storage. The development of external memory algorithms for solving systems with large matrices was a popular research topic in the near past. The performance issues due to the bandwidth and latency of magnetic disks were addressed by utilizing clusters with distributed memory and high bandwidth interconnections, with high cost. Nowadays, the emergence of flash storage presents new opportunities for out-of-core computing. In the recent years, flash memory was emerged as widely utilized storage medium. SSDs based on flash memories lack of mechanical and moving parts, provide low power consumption, and high random read/write performance. Increased reliability and decreased cost make them the storage medium of choice. The outstanding performance of enterprise-level flash-based storage motivated authors of [2] to investigate the performance of out-of-core sparse matrix vector multiplication (SpMV) on a small SSD testbed cluster. Solving large sparse linear systems efficiently is of significant importance for sciences and engineering. This necessity motivated us to study the performance of out-of-core algorithms in NVM and an initial work of ours on the performance of INTEL MKL PARDISO and MUMPS direct solvers is presented in [3]. In this paper we summarize results from [3], we introduce evaluation of IBM WSMP solver and enrich experiments with two new linear systems. II. DIRECT SOLVERS Most direct sparse methods rely on Gaussian elimination and involves the factorization of the coefficient matrix. In generally, they conduct the solution of a linear system in four phases: a) ordering b) analysis and symbolic factorization, c) numerical factorization, and d) forward and backward substitution including iterative refinement. Filling of the factors increases memory requirements. This is deteriorated due to parallelization on modern multi-core systems. To overcome this issue some solvers incorporated OCC algorithms. PARDISO [4] is a shared memory multiprocessing parallel direct solver which supports real (R), complex, symmetric (S), structurally symmetric, unsymmetric (U), positive definite (PD), indefinite (I) and Hermitian systems. It provides out-of- core functionality, utilizing external memory to retain matrix factors. MUMPS [5][6] is a parallel direct solver for sparse linear equations that uses external memory to preserve matrix factors. It supports unsymmetric, symmetric positive definite and general symmetric systems. MUMPS relies on MPI for parallelization, a host processor distributes the matrix and aggregates the results. IBM WSMP [7] package provides a multithreaded OOC solver for real symmetric systems. It uses Performance study of Out-of-Core Direct Sparse Solvers in Flash Storage Athanasios Fevgas, Panagiota Tsompanopoulou, and Panayiotis Bozanis Department of Electrical & Computer Engineering University of Thessaly Volos, Greece e-mails: {fevgas, yota, pbozanis}@inf.uth.gr