Introducing the Grand Ensemble

Exploring a new computer system during its installation phase is a
challenging task. However, getting very early access offers uniq
possibilities for simulations.

One of the major problems in climate modelling is to get the
opportunity to tune models in higer resolutions as this is part of
model development in preparation of scientific work. Posing this
problem to computing centre diretors we finally got the offer to
perform such a tuning process for mpiesm1 in its HR resolution at the
Swiss Super-Computing Centre (CSCS) by its director Thomas Schulthess.

Shortly after the machine at CSCS had been powered on, we got access
to the system and started porting the model. Important remark: part of
the agreement is that jobs can be killed any time and the whole system
could be rebooted whenever needed without warning. The first step is
installing the required software stack as the machine has none
yet. Transfering the input and boundary condition data requires
support by all involved institutions in terms of network
configurations and disk space usage as most computing centres do have
insufficient disk capacity and proper short-term data caching and
immediate transfer to the German Climate Computing Centre (DKRZ) is
essential.

Following this basic steps the model gets compiled and the porting has
taken place. This consists of two steps, if no problems arise: first
an AMIP simulation, which gets validated with the already evaluated
simulations at DKRZ; followed by pre-industrial control (picontrol)
run of the fully coupled model. The results need to be inline with the
same simulation done before at DKRZ. This part of porting is still
based on the mpiesm1 LR model resolution. As this tests have been
passing without problems we could the mpiesm HR tuning process.

As the machine environment is not really stable in this part of the
installation process of such a big machine, one has to take care that
the restart cycle of the single model jobs is sufficiently small to
keep the simulated years per day sufficiently high. Looking on the
queueing system status revealed that the tuning process did require
only a very small part of the machine and this lead to the idea to run
a large ensemble of historical runs as defined by CMIP5 on the empty
part of the machine. This idea was accepted by CSCS and so we started.

Essential components to get this done by a single maintainer have been
two tool sets. One is the, at that time, newly developed templated run
script system mkexp allowing flexible adaptions for large numbers of
concurrent experiments and the second an easy to deploy and use the
meta-scheduler cylc for controling and orchestrating the simulation.
This was especially useful in handling job failures.

How to initialize the ensemble. The most straight forward way was to
use the picontrol simulation and use a January, 1st state every other
year as all describe one possible state of climate in the beginning of
1850s.

As the capacity of the file system in this installation phase of the
production machine was very low a just-in-time primary post-processing
system changes had been necessary. This involved access to an
available post-processing cluster and setting up the multi-cluster
feature of the available batch system.

The system setup was running for almost 5 month. Unfortunately, some
output files from the picontrol experiment got lost due to network
outages and machine failures in the post-processing server.

What is available?

  • one picontrol run of 2682 years
  • 100 realisations of historical runs (CMIP5)
  • 67 realisations of runs with 1\% increase of CO2 concentration (CMIP5)


Acknowledgements


  • Kalle (mkexp)
  • Hilary Oliver, NIWA (cylc)
  • Will Sawyer and CSCS staff
  • Rainer and Alex (CIS)
  • Thomas Schulthess, CSCS