Use Cases

The following use cases are included in the project for co-design. These are all data intensive applications that are foreseen to face severe I/O demands at Exascale. The various components of the IO-SEA stack will be exploited by these use cases.

RAMSES : Astrophysical Plasma Flows

RAMSES is an open source simulation code for astrophysical compressible plasma flows, featuring self-gravitation, magnetism, radiative processes. It is based on the Adaptive Mesh Refinement (AMR) technique on a fully-threaded graded octree. It is written in Fortran 90 and makes intensive use of the MPI library.

RAMSES computational intensive routines have been highly optimized to take advantage of supercomputer architectures. Nevertheless, the RAMSES legacy I/O engine use the “files per process” paradigm which rapidly leads to I/O bottlenecks starting at 8,000 cores. In order to prepare the exascale era, the Hercule parallel I/O library (IO-SEA WP5) has been successfully integrated in RAMSES.

This leads to an upgrade of the I/O system of RAMSES than can now makes post-processing specific outputs in addition to checkpoints/restarts.

Analysis of Electron Microscopy Images

Cryo-electron microscopy (cryo-EM) is a rapidly evolving technique in the field of Structural and Cellular Biology research. The method is used to derive 3D models of macromolecules from large number of 2D projection images collected by the electron microscopes. Amount of data generated during acquisition of electron micrographs for so called single particle analysis is significant and, what is more, deriving of 3D models requires substantial computational resources. 

CEITEC operates several transmission electron microscopes including the state-of-the-art Titan Krios which produces 1-2TB of raw data per day and is operated in 24/7 mode. The raw data are currently stored on the local HDD storage to be further processed to derive the 3D models of the studied macromolecules. The data are processed at a couple of bare-metal machines equipped with GPU also located on premises at CEITEC. 

Main objective of this use-case within the IO-SEA context is to use the HPC and storage resources operated by IT4I. The purpose is to improve processing time of the raw data produced by the electron microscope and to facilitate data publication mechanisms. 

Cryo-electron microscopy image processing pipeline
Cryo-electron microscopy image processing pipeline

 

ECMWF Weather Forecasting Workflow

ECMWF is an intergovernmental organisation supported by 23 member states and 11 co-operating states. It is research institute and a 24/7 operational service, producing numerical weather predictions and other data for the weather and climate communities. 

ECMWF’s primary forecasts consist of a 9km, 10-day high-resolution (HRES) deterministic forecast and an 18km, 15-day, 51-member ensemble (ENS) forecast – these run four times per day and cover the entire globe. Weather prediction data is incredibly valuable to a wide range of industries, but it is only valid for a short time.
As such, the forecasts are time-critical and data must be delivered to downstream consumers according to a tight schedule.
 

ECMWF’s tape library
ECMWF’s tape library

ECMWF’s operational workflow is built around the flow of data: observations come in and are assimilated into the Integrated Forecasting System (IFS), whose output is then post-processed and distributed as well as archived. This workflow uses a significant part of ECMWF’s computing resources and produces about 30 TiB of model data per run.

The time-critical part of the workflow has to run within an hour. It consists of the aforementioned high-resolution forecast and 51 coarser- resolution ENS forecasts, as well as the associated post-processing ‘product generation’. As soon as either the high-resolution run or all 51 ensemble runs have produced a snapshot, about 70% of it is processed to generate products to be disseminated to the member states and customers.
This creates a significant concurrent read/write contention on the filesystem that requires careful tuning of the I/O pipeline.

One can expect a sustained increase in the amount of data that will be processed which will also require adaptation in the way forecast data is shared and accessed.

ECMWF 1 km resolution forecast run
ECMWF 1 km resolution forecast run
Daily mean surface air temperature in Europe in July 2010 from ERA5
Daily mean surface air temperature in Europe in July 2010 from ERA5

TSMP: Multi-physics Regional Earth System Model

The Terrestrial Systems Modelling Platform (TSMP) is a fully coupled, scale consistent, highly modular and massively parallel regional Earth System Model. TSMP  is a model interface which couples three core model components: the COSMO model for atmospheric simulations, the CLM  land surface model and the ParFlow  hydrological model. Coupling is done through the OASIS coupler. TSMP is also enabled for Data Assimilation (DA) through PDAF.

TSMP allows to simulate complex interactions and feedbacks between the different compartments of terrestrial systems. Specifically, it enables the simulation of mass, energy and momentum fluxes and exchanges across land surface, subsurface and atmosphere.

TSMP is maintained by the Simulation and Data Lab Terrestrial Systems (SDLTS) at JSC, and is an open source software publicly available in GitHub (https://github.com/HPSCTerrSys/TSMP).

Lattice QCD

The goal of a numerical LQCD calculation is to understand and predict the properties of hadrons, or particles composed of quarks and gluons. The starting point is the QCD action, an equation describing the interactions of quark and gluon quantum fields. Quarks and gluons interact via the strong force and have a charge characterised by a “colour”, hence the name quantum-chromodynamics. Clever mathematical transformations, namely, treating time as an imaginary variable and taking space- time as a discrete and finite 4-D grid (hence the “lattice”), allow the simulation of quark and gluon fields as a statistical physics problem. First, a large statistical ensemble of background gluon field configurations (known as “gauge configurations”) is generated through Monte Carlo simulation.

Second, the configurations are re-loaded, and on each configuration operators corresponding to physical observables are calculated. The average over the ensemble of these operators is interpreted as the expectation value of the observable, such as one might measure in a suitable experiment.

Several features of the IO-SEA solution should improve efficient and scientific output of LQCD calculations.

Representation of protons (with three quarks inside) and a plot showing how LQCD can explain the observed spectrum of light hadron masses
Representation of protons (with three quarks inside) and a plot showing how LQCD can explain the observed spectrum of light hadron masses