= Status quo = ---- == Current release '''The release candidate of PALM-4U was released October 2018. Visit [[http://www.palm4u.org|palm4u.org]] ''' ---- The release candidate includes the following new components: * Multi Agent System * RANS mode (TKE-epsilon closure) * Indoor climate and energy demand module * Emission module * Aerosol physics/chemistry (SALSA) * RANS-LES and RANS-RANS nesting * Biometeorology output * Virtual measurement module * Graphical User Interface (GUI) for users from practice ---- == Validation runs == === VDI 3783 Part 9 === Status message: '''completed''' ||=Description =||=Date of issue =||=Closing date =|| ||'''Validation completed''' ||'''23/07/2019''' ||'''23/07/2019''' || Validation protocol and details: [http://palm.muk.uni-hannover.de/trac/wiki/doc/tec/evaluation here] ---- === Run 01 (VALM01): Winter 2017 Berlin, Jan 17 06:00 UTC - Jan 18 06:00 === Status message: '''debugging''' ||=Description =||=Date of issue =||=Closing date =||=Further remarks=|| ||Start of VALM01 testing ||01/01/2019 || || || || ||... ||... || || ||Crash in nested runs. ||April-May ||... ||Smaller test simulations run well. || ||Bug in radiation in a nested run ||21/05/2019 ||22/05/2019 || || || ||... ||... || || ||Fix numerical issues that lead to unrealistic concentrations of chemical compounds in case of (offline) nesting. ||July ||end of July || || ||Memory demand for calculation of view factors (radiative transfer) ||01/08/2019 ||03/08/2019 ||OOM killer aborted processes randomly. Using not all cores on node fixed this. || ||Parent and child grids do not overlap (required after revision of the nesting) ||05/08/2019 ||05/08/2019 ||New child drivers are required as number of grid points changed. || ||Driver problem with bridges at the boundary. ||05/08/2019 ||05/08/2019 ||Child was moved a few meter northward. || ||Bug in building parameters, wrong dimension in static input file ||08/08/2019 ||08/08/2019 || || ||Bug when green roofs are present. ||09/08/2019 ||10/10/2019 ||Green roofs where disabled in the simulation. Fixed now. || ||MPI network problems on the cray machine in Berlin ||10/08/2019 ||19/8/2019 ||Recurring MPI failures at different locations in the code, appeared only in the large winter IOP simulations, not in smaller ones. As a consequence, all runs were carried-out on the Atos machine in Göttingen. || ||Minor bug in new implementation of external radiative forcing ||21/08/2019 ||21/08/2019 || || ||Failure due to MPI errors. ||05/09/2019 ||21/09/2019 ||Appeared also in smaller test simulations. Need to be fixed on HLRN side. || ||Emission module caused model crash ||23/09/2019 ||27/09/2019 || || ||MPI error just at the beginning of the run - HLRN internal problem: ofi fabric is not available ... ||24/09/2019 ||? || Error does not appear any more. || ||Crashed with error message ''corrupted double-linked list'' ||01/10/2019 || 07/10/2019 ||Seems that scheduling is not working properly, had been queued for 4 days!!! Further debug messages implemented to narrow down the location. Message comes from the parent domain. However, memory consuming sky-view factors were calculated. || ||Crashed with ''An allocatable array is already allocated'' ||09/10/2019 || 10/10/2019 || || ||Crashed by an MPI error ||13/10/2019 || ||Parent finished initialization. Crashes in a MPI_ALLGATHER call in surface_data_output_init. Might be connected to the HLRN-network problem (14/10/2019). || ||Crashed again with error message ''corrupted double-linked list'' in child simulation. ||17/10/2019 || ||Parent finished initialization. Crashes again in surface_data_output_init. Next step: switch-off surface-data output as this has no priority at the moment. '''Note, due to limited resources on HLRN site, the queuing times are quite long for simulations, sometimes several days.'''|| ||Crashed with ''Floating divide by zero'' ||23/10/2019 || || Error seems to be raised within routine ''drydepo_aero_zhang_vd''. Error occurs after time stepping started (initialization finished). Further debugging for this error is ongoing. || || Start child-only simulation || 23/10/2019 || || Due to continuous errors within the nested simulation, a non-nested (child-only) simulation is started to get first results for evaluation. Simulation is still running (06/11/2019). || || Finished child-only simulation || 27/11/2019 || || Simulation crashed at 11:57:34.95UTC with input/output error. Data up to that point is saved. || || Crashes by MPI_INIT || 03/11/2019 || || Simulation crashed several time in MPI_INIT (environment problems) || || Crash || 03/12/2019 || || program abort due to check of surface_fractions, check was revised so that surface fractions can also be set at building grid points || || Crash || 06/12/2019 || || HDF5 Error - could not reproduced || || Crash || 08/12/2019 || || Floating invalid in advection for u-component at first timestep. Unfortunately, this error could not reproduced. '''Remark:''' Jobs were queued for about a week on HLRN due to too low capacities, so that investigations and bug tracing was delayed. || || Parent simulation || 23/12/2019 || || Proceed investigation on HLRN Berlin. Parent simulation runs for an hour, results looks plausible. || || Nested simulation - numerical issues || 27/12/2019 || || Nested simulation ran for 1 minute. However, large oscillation in the u- and v-component could be observed within the child. I hypothesize that this is due to the 3D-initialization of the child from the parent. Due to mismatches in the building configuration (due to the large grid aspect ratio), many grid points in the child remain zero after initialization, even though these grid points belong to the atmosphere. Since the mass-flux is largely affected by this, strong oscillations arise within the child, finally lead to a crash. || || Nested simulation || 02/01/2020 || || Simulations repeatedly hang / crash. The Lustre system in Berlin is still not full setup so that simulations repeatedly hang / crash due to slow filesystem. || || Nested simulation - initial run || 03/01/2020 || || Lustre filesystem issues seem to be solved for now. Initialization of the child has been changed. Child is now initialized via dynamic driver rather than via the coupler. This way all atmosphere grid points are initialized appropriately. The nested simulation is at t=30min. '''First estimate of duration''': in 12 h real time on 6720 cores we will simulate about 1 h. With 30 hrs simulation time (00:00:00 UTC - 06:00:00 UTC, next day), we will need about 30 restarts. Since the machine in Berlin starts to fill up now with other users, we are only be able to do 1 simulation at a day (optimistic scenario), so this will take at least one month. || || Nested simulation - restart run || 09/01/2020 || || Simulation crashes in reading the restart data for one PE in the child. || || Nested simulation - restart run || 29/01/2020 || || After recurrent maintenance-related breaks on HLRN, restart simulation started again. Simulation alternately crashes either with a HDF 5 error in the parent or in reading the restart data. In the parent this happens while reading the Netcdf input data. At most of the ranks there is no problem with the NetCDF input, however, at some ranks the NF90_INQUIRE and NF90_INQUIRE_VARIABLE produces NetCDF error codes. In the child, the error is reproducible, even if the initial simulation is run again the problem occurs. This happens only at specific ranks. We will downscale the simulation to debug this more efficiently. '''(Un)fortunately these problems do not occur any more after HLRN runs more stable, so that the reason for these crashes cannot be traced back. ''' || || Nested simulation || 06/02/2020 || || After several fixes on HLRN side, I started the whole simulation with debug prints again. Initial simulation did not show any problems. The following restart run also run fine, no problem with NF90_INQUIRE as well as with empty binary files. The second restart run is queued now. We are at t ~ 2940 s. || || Nested simulation || 10/02/2020 || || We are at 03:00 UTC. Model run crashed in biometeorology_mod at first timestep after restart. The crash could be traced back to a NaN in pt_av at a single grid point. All other quantities, including pt, look reasonable. || || Nested simulation || 27/02/2020 || Simulation was started again. This time we reached 04:00 UTC. Simulation crashes now again after a restart in reading the array "surf_h(0)%end_index", where some unreasonable values occur. On all other processes values for this array look correct. || || || Nested simulation || 12/03/2020 || Simulation was started again. || After several optimizations where made in the synthetic turbulence generator and some minor bugs were fixed, the simulation was started again. Berlin complex is under maintenance now. || || Nested simulation || 25/03/2020 || Simulation was running until exactly 05:00UTC. Crashed by floating overflow in the child domain. || Last restart time was at 04:55 UTC, flow fields, surface data look reasonable. Restarting from last restart step using traceback option and print statements revealed an floating overflow in output of averaged 3D variable 'theta' at grid point (k,j,i) = (97,117,968), which is far away from any building. Think this is also related to a restart problem where faulty data is read for pt_av. Proceeding without averaged data output worked. || || Nested simulation || 01/04/2020 || Simulation has reached 06:05 UTC. || At the moment we are out of computing time. The IOP has been started, i.e. measurements are output. However, it turned out that the unstructured output of the virtual measurements consumes far too much CPU time at the moment. With smaller number of processes in test simulations this did not become obvious, however, with large number of processes the probability that IO processes interfere with each other becomes higher so that the slowdown of IO becomes more pronounced. First we need to accelerate the output before we can proceed. Moreover, with further debugging the reason of restart failures could be most probably narrowed down to file-system issues rather than palm-internal problems (sending trouble ticket to the computing center). || || Nested simulation || 02/06/2020 || Simulation has reached 06:40 UTC. || Output issues are solved and we have CPU time again. || || Nested simulation || 25/06/2020 || Simulation is at 01:51 UTC (2nd day). || Simulation is stopped because we have run out of computing time. || || Nested simulation || 08/07/2020 || Simulation is still at 01:51 UTC (2nd day). || We got new computing time resources on 1st of July, but now the Lise system is down due to file system problems since several days, so jobs cannot be executed. || || Nested simulation || 17/07/2020 || Simulation is at 02:10 UTC (2nd day). || Data output on the Lise system is extremely slow since 13/07/2020, so that the progress made in a simulation is only 2-3 min (instead of 1 h compared to the situation before system maintenance). || || Nested simulation || 23/07/2020 || Simulation is at 05:41 UTC (2nd day). || Data output on the Lise system is gone since last reboot on 17/07/2020. || || Nested simulation || 23/07/2020 || '''Simulation is at 06:00 UTC (2nd day) - finished'''. || output files need to be concatenated || ---- === Run 02 (VALM02): Summer 2018 Berlin, Jul 16 06:00 UTC - Jul 18 06:00 === Status message: '''preparation''' ||=Description =||=Date of issue =||=Closing date =||=Further remarks=|| ||Preparing input files for VALM02 ||30/01/2020 || || || ||Dynamic driver ||13/03/2020 ||17/04/2020||Error in inifor prevents dynamic-driver creation: {{{inifor: ERROR: PALM-4U grid extends above COSMO-DE model top.}}}. Bug-fixing is in progress. DWD created a preliminary driver with which further testing can be done.|| ||Dynamic driver ||17/04/2020 ||04/05/2020||Further errors in inifor prevents final dynamic-driver creation. Bugfixes in INIFOR and computer setup solved with help from DWD.|| ||Setup creation ||20/05/2020 || ||Defining details of setup like domain height, boundary conditions, technical setup.|| ||Nested simulation ||06/08/2020 || Simulation is now at 00:30 UTC (1st day) ||Problems during wall/soil spinup has been solved (timestep was too large). Instabilities during wall/soil spinup are investigated separately.|| ||Nested simulation ||12/08/2020 || Simulation is at 02:58 UTC (1st day) || || ||Error in rtm concerning svf calculation||12/08/2020 || 22/08/2020 || Due to changes in CPU layout, SVF needed to be re-calculated. During this step, MPI errors occurred because too much memory was required by the MPI calls. This is now solved by reducing the amount of view angles for SVF calculation. || ||Nested simulation ||27/08/2020 || Simulation is at 04:58 UTC (1st day) || || ---- === Run 03 (VALM03): Winter 2017 Stuttgart, Feb 14 06:00 UTC - Feb 16 06:00 === Status message: '''unscheduled''' ---- === Run 04 (VALM04): Summer 2017 Berlin, Jul 08 04:00 UTC - Jul 09 19:00 === Status message: '''unscheduled''' ---- === Run 05 (VALM05): Hamburg, Wind tunnel === Status message: '''completed''' ||=Description =||=Date of issue =||=Closing date =|| ||'''Production run''' ||'''18/04/2019''' ||'''29/04/2019''' || {{{#!td align=left style="border: none; vertical-align:top; width: 50%" [[Image(valm05Results.png, 40%, link=http://palm.muk.uni-hannover.de/mosaik/attachment/wiki/palm4u/status/valm05_results.pdf)]] }}} ---- === Run 06 (VALM06): Summer 2017 Berlin, Jul 30 06:00 UTC - Aug 01 06:00 === Status message: '''unscheduled'''