| 37 | == Localized raytracing parallelization scheme == |
| 38 | |
| 39 | The localized raytracing parallelization scheme is enabled by the namelist switch {{{localized_raytracing = .TRUE.}}}. It brings significant speedup of raytracing, avoids all MPI one-sided operations and removes the need for several global arrays, thus improving scalability. |
| 40 | |
| 41 | The scheme is based on splitting each ray to segments that belong to individual subdomains, and raytracing those segments locally in the respective process. The raytracing record is then passed to the next process (for next segment) as an MPI message (request for raytracing). For regular faces (surface elements), each ray has to be followed forward and then backwards all the way to the origin, for other types of raytracing (e.g. MRT factors), from the end of the ray the record returns immediately to the origin process. |
| 42 | |
| 43 | Each process waits for incoming requests from other processes. It starts by posting the asynchronous `MPI_IRECV`, which means that it is able to receive requests, and the `MPI_ANY_SOURCE` means that the requests may come from any other process. It then checks whether there are any incoming messages (requests for raytracing) by `MPI_TEST` (see `lrt_process_pending`), which returns immediately if there are none. In that case, the process continues doing one piece of its own work, after which it tests again. Upon receiving a message with a request, it immediately posts a new `IRECV` to be able to receive further messages, it serves the request (by performing a raytracing of a requested segment and sending the message further for next segment) and goes back to checking for further incoming messages. |
| 44 | |
| 45 | Once the process has done all of its own work and it needs to get further results (that will arrive as messages) until it can continue, it uses `MPI_WAIT` instead of `MPI_TEST`, which does not return immediately and it waits until at least one message arrives. It continues using that until it has received all the information it needs (all rays which it had originated have returned from raytracing). A process's own loop is organized in such a way that it starts by sending rays to all azimuths from one face, then it waits until they all return, it aggregates the information into view factors and only then it can continue with the rays from the next face. |
| 46 | |
| 47 | After all processes are finished with all their work, they pass a round of ''completion'' messages that let the others know about that, see `lrt_check_completion`. This is assured by this order: |
| 48 | |
| 49 | * the completion message is always sent from process ''i'' to process ''i''+1 |
| 50 | * process 0 can send the completion message to process 1 immediately when it is finished |
| 51 | * process ''i'' where ''i''=1...''n''-1 sends the message only after it is finished **and** it has already received the completion message from process ''i''-1 |
| 52 | * when process ''n'' (the last one) receives message from ''n''-1 **and** it is also finished, it knows that everyone is finished. Then it sends the ''termination'' message to **every process**, including itself in order to consume the last, already posted `MPI_IRECV`, and to end the processing loop. |
| 53 | |
| 54 | When a process receives the termination message, it does not post another `MPI_IRECV`, it ends the processing loop and the whole raytracing is done. |
| 55 | |