60 | | || Nested simulation - restart run || 29/01/2020 || || After recurrent maintenance-related breaks on HLRN, restart simulation started again. Simulation alternately crashes either with a HDF 5 error in the parent or in reading the restart data. In the parent this happens while reading the Netcdf input data. At most of the ranks there is no problem with the NetCDF input, however, at some ranks the NF90_INQUIRE and NF90_INQUIRE_VARIABLE produces NetCDF error codes. In the child, the error is reproducible, even if the initial simulation is run again the problem occurs. This happens only at specific ranks. We will downscale the simulation to debug this more efficiently. || |
| 60 | || Nested simulation - restart run || 29/01/2020 || || After recurrent maintenance-related breaks on HLRN, restart simulation started again. Simulation alternately crashes either with a HDF 5 error in the parent or in reading the restart data. In the parent this happens while reading the Netcdf input data. At most of the ranks there is no problem with the NetCDF input, however, at some ranks the NF90_INQUIRE and NF90_INQUIRE_VARIABLE produces NetCDF error codes. In the child, the error is reproducible, even if the initial simulation is run again the problem occurs. This happens only at specific ranks. We will downscale the simulation to debug this more efficiently. '''(Un)fortunately these problems do not occur any more after HLRN runs more stable, so that the reason for these crashes cannot be traced back. '''|| |
| 61 | || Nested simulation || 06/02/2020 || || After several fixes on HLRN side, I started the whole simulation with debug prints again. Initial simulation did not show any problems. The following restart run also run fine, no problem with NF90_INQUIRE as well as with empty binary files. The second restart run is queued now. We are at t ~ 2940 s. || |