The Fork-Join approach is an implementation of AsyncSHMEM that supports most of the proposed extensions from Sect. 3.1. It is open source and available at https://github.com/openshmem-org/openshmem-async.
This particular implementation of AsyncSHMEM integrates asynchronous task parallelism without making any changes to the core OpenSHMEM runtime. Changes are limited to the user-level API’s in OpenSHMEM. The goal of the Fork-Join implementation was to study the impact of supporting basic asynchronous tasking in OpenSHMEM. In this approach, only the main thread (or process) is allowed to perform OpenSHMEM communication operations (blocking puts and gets, collectives). The asynchronous child tasks are not allowed to perform communication. The main thread can create child tasks by calling shmem_task_nbi or shmem_parallel_for_nbi. These child tasks can further create arbitrarily nested tasks. Synchronization over these tasks can be achieved either by explicitly creating task synchronization scopes by using shmem_task_scope_begin and shmem_task_scope_end, or implicitly by calling shmem_barrier_all. The shmem_init call starts a top- level synchronization scope by calling shmem_task_scope_begin internally. Each shmem_barrier_all call includes an implicit sequence of shmem_task_scope_end and shmem_task_scope_begin calls, i.e., it first closes the current synchronization scope and then starts a new scope. The call to shmem_finalize internally calls shmem_task_scope_end to close the top-level synchronization scope. The programmer is allowed to create arbitrarily nested task synchronization scopes using shmem_task_scope_begin and shmem_task_scope_end. We call this implementation of AsyncSHMEM a Fork-Join approach because of the implicit task synchronization scopes integrated inside the call to shmem_barrier_all, causing a join at each barrier but allowing the forking of asynchronous tasks between barriers. A typical usage of this implementation is shown in Fig. 1, which closely mirrors an OpenSHMEM+OpenMP based hybrid programming model.
Fig. 1. Fork-Join asynchronous task programming model in OpenSHMEM. The intrarank asynchronous child tasks cannot make any communication calls.