MOSAIK/PALM-4U simulation status and results

First building-resolving large-eddy simulations for entire Berlin

Presentation at ICUC10, New York City, August 2018

Validation runs

VDI 3783 Part 9

Status message: completed

Description Date of issue Closing date
Validation completed 23/07/2019 23/07/2019

Validation protocol and details: here

Run 01 (VALM01): Winter 2017 Berlin, Jan 17 06:00 UTC - Jan 18 06:00

Status message: debugging

Description Date of issue Closing date Further remarks
Start of VALM01 testing 01/01/2019
... ...
Crash in nested runs. April-May ... Smaller test simulations run well.
Bug in radiation in a nested run 21/05/2019 22/05/2019
... ...
Fix numerical issues that lead to unrealistic concentrations of chemical compounds in case of (offline) nesting. July end of July
Memory demand for calculation of view factors (radiative transfer) 01/08/2019 03/08/2019 OOM killer aborted processes randomly. Using not all cores on node fixed this.
Parent and child grids do not overlap (required after revision of the nesting) 05/08/2019 05/08/2019 New child drivers are required as number of grid points changed.
Driver problem with bridges at the boundary. 05/08/2019 05/08/2019 Child was moved a few meter northward.
Bug in building parameters, wrong dimension in static input file 08/08/2019 08/08/2019
Bug when green roofs are present. 09/08/2019 10/10/2019 Green roofs where disabled in the simulation. Fixed now.
MPI network problems on the cray machine in Berlin 10/08/2019 19/8/2019 Recurring MPI failures at different locations in the code, appeared only in the large winter IOP simulations, not in smaller ones. As a consequence, all runs were carried-out on the Atos machine in Göttingen.
Minor bug in new implementation of external radiative forcing 21/08/2019 21/08/2019
Failure due to MPI errors. 05/09/2019 21/09/2019 Appeared also in smaller test simulations. Need to be fixed on HLRN side.
Emission module caused model crash 23/09/2019 27/09/2019
MPI error just at the beginning of the run - HLRN internal problem: ofi fabric is not available ... 24/09/2019 ? Error does not appear any more.
Crashed with error message corrupted double-linked list 01/10/2019 07/10/2019 Seems that scheduling is not working properly, had been queued for 4 days!!! Further debug messages implemented to narrow down the location. Message comes from the parent domain. However, memory consuming sky-view factors were calculated.
Crashed with An allocatable array is already allocated 09/10/2019 10/10/2019
Crashed by an MPI error 13/10/2019 Parent finished initialization. Crashes in a MPI_ALLGATHER call in surface_data_output_init. Might be connected to the HLRN-network problem (14/10/2019).
Crashed again with error message corrupted double-linked list in child simulation. 17/10/2019 Parent finished initialization. Crashes again in surface_data_output_init. Next step: switch-off surface-data output as this has no priority at the moment. Note, due to limited resources on HLRN site, the queuing times are quite long for simulations, sometimes several days.
Crashed with Floating divide by zero 23/10/2019 Error seems to be raised within routine drydepo_aero_zhang_vd. Error occurs after time stepping started (initialization finished). Further debugging for this error is ongoing.
Start child-only simulation 23/10/2019 Due to continuous errors within the nested simulation, a non-nested (child-only) simulation is started to get first results for evaluation. Simulation is still running (06/11/2019).
Finished child-only simulation 27/11/2019 Simulation crashed at 11:57:34.95UTC with input/output error. Data up to that point is saved.
Crashes by MPI_INIT 03/11/2019 Simulation crashed several time in MPI_INIT (environment problems)
Crash 03/12/2019 program abort due to check of surface_fractions, check was revised so that surface fractions can also be set at building grid points
Crash 06/12/2019 HDF5 Error - could not reproduced
Crash 08/12/2019 Floating invalid in advection for u-component at first timestep. Unfortunately, this error could not reproduced. Remark: Jobs were queued for about a week on HLRN due to too low capacities, so that investigations and bug tracing was delayed.
Parent simulation 23/12/2019 Proceed investigation on HLRN Berlin. Parent simulation runs for an hour, results looks plausible.
Nested simulation - numerical issues 27/12/2019 Nested simulation ran for 1 minute. However, large oscillation in the u- and v-component could be observed within the child. I hypothesize that this is due to the 3D-initialization of the child from the parent. Due to mismatches in the building configuration (due to the large grid aspect ratio), many grid points in the child remain zero after initialization, even though these grid points belong to the atmosphere. Since the mass-flux is largely affected by this, strong oscillations arise within the child, finally lead to a crash.
Nested simulation 02/01/2020 Simulations repeatedly hang / crash. The Lustre system in Berlin is still not full setup so that simulations repeatedly hang / crash due to slow filesystem.
Nested simulation - initial run 03/01/2020 Lustre filesystem issues seem to be solved for now. Initialization of the child has been changed. Child is now initialized via dynamic driver rather than via the coupler. This way all atmosphere grid points are initialized appropriately. The nested simulation is at t=30min. First estimate of duration: in 12 h real time on 6720 cores we will simulate about 1 h. With 30 hrs simulation time (00:00:00 UTC - 06:00:00 UTC, next day), we will need about 30 restarts. Since the machine in Berlin starts to fill up now with other users, we are only be able to do 1 simulation at a day (optimistic scenario), so this will take at least one month.
Nested simulation - restart run 09/01/2020 Simulation crashes in reading the restart data. Bugtracing is on its way.

(note that the list of past bugfixes will be updated soon)

Run 02 (VALM02): Summer 2018 Berlin, Jul 16 06:00 UTC - Jul 18 06:00

Status message: unscheduled

Run 03 (VALM03): Winter 2017 Stuttgart, Feb 14 06:00 UTC - Feb 16 06:00

Status message: unscheduled

Run 04 (VALM04): Winter 2017 Berlin, Jul 08 04:00 UTC - Jul 09 19:00

Status message: unscheduled

Run 05 (VALM05): Hamburg, Wind tunnel

Status message: completed

Description Date of issue Closing date
Production run 18/04/2019 29/04/2019

Run 06 (VALM06): Summer 2017 Berlin, Jul 30 06:00 UTC - Aug 01 06:00

Status message: unscheduled

Last modified 12 days ago Last modified on Jan 10, 2020 11:50:55 AM

Attachments (6)

  | Impressum | ©Leibniz Universität Hannover |