Paving the Road towards Pre-Exascale Supercomputing Dirk Brömmel, Ulrich Detert, Stephan Graf, Thomas Lippert, Boris Orth, Dirk Pleiter, Michael Stephan, and Estela Suarez Institute for Advanced Simulation, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany E-mail: th.lippert@fz-juelich.de Supercomputing at scale has become the decisive challenge for users, providers and vendors of leading supercomputer systems. On next-generation systems, approaching exascale by the end of the decade, we will be confronted with millions of cores, and the need of massive parallelism. Beyond aggregating ever larger compute performance also the ability to hold and efficiently process drastically increasing amounts of data will be key to enable future leading research facilities for computational science. We report in this article on the evolving supercomputing infrastructure at Jülich Supercomputing Centre (JSC), research and development activities on future HPC technologies and architectures as well as on the computational science research and collaboration with science areas which will require exascale supercomputing in the future. 1 Introduction In 2005 the Jülich Supercomputing Centre (JSC) started it’s dual system strategy to most efficiently serve the application portfolio of the users of the Jülich Research Centre, the John von Neumann Institute for Computing in Germany and since mid of 2010 the Part- nership for Advanced Computing in Europe (PRACE). Via the German Gauß Centre for Supercomputing in 2009 a first milestone was reached with the installation of the IBM Blue Gene/P system named JUGENE as highly scaling system (294,912 cores) and JU- ROPA (25,000 Intel Nehalem CPU cores) as highly flexible, general-purpose cluster sys- tem. This dual system strategy has been carried forward end of 2012 with the installation of a new highly scaling, 28 rack IBM Blue Gene/Q system named JUQUEEN entering the TOP500 list at rank 7 world wide and as #1 in Europe (see Sec. 2). With it’s 458,752 cores more than 1.5 million hardware threads can be executed concurrently by a single applica- tion. Several applications have been proven to scale to this extent and are now members in the “JSC High-Q Club” (see Sec. 2.1). The compute system is flanked by a new GPFS storage system providing for the first time full end-to-end data integrity and a maximum I/O bandwidth of 200 GByte/s. To enable future architectures where non-volatile storage devices are integrated into the supercomputer to provide even higher bandwidth and sig- nificantly higher access rates, JSC collaborated with IBM on an active storage subsystem attached to JUQUEEN (see Sec. 3). To prepare the next step of replacing the general-purpose system JUROPA by a system approach of 2PFlop/s peak performance, a new test cluster called JUROPA-3 has been installed. The work on future architectures where such clusters are coupled to a booster comprising tightly coupled many-core devices is continued in the new DEEP-ER project (see Sec. 5) which extends the ongoing Dynamical Exascale Entry Platform Extended Reach (DEEP) project. A new exascale lab, the NVIDIA Application Lab, focusses on another type of many-core architectures, namely GPUs (see Sec. 6).