In this section we present proposed API extensions to the OpenSHMEM specification, as well as two runtime implementations of those extensions.
The existing OpenSHMEM specification focuses on performing communication to and from processing elements (PEs) in a PGAS communication model. This work extends the OpenSHMEM specification with APIs for both creating asynchronously executing tasks as well as declaring dependencies between communication and computation. In this section, we briefly cover the major API extensions. Due to space limitations, these descriptions are not intended to be a comprehensive specification of these new APIs.
In general, the semantics of OpenSHMEM APIs in AsyncSHMEM are the same as any specification-compliant OpenSHMEM runtime. For collective routines, we expect that only a single call is made from each PE. The ordering of OpenSHMEM operations coming from independent tasks must be ensured using task-level synchronization constructs. For example, if a programmer requires that a shmem_fence call is made between two OpenSHMEM operations occurring in other tasks, it is their responsibility to ensure that the inter-task dependencies between those tasks ensure that ordering. The atomicity of atomic OpenSHMEM operations is guaranteed relative to other PEs as well as relative to all threads.
void shmem_task_nbi( void (* body )( void *), void * user_data);
shmem_task_nbi creates an asynchronously executing task defined by the user function body which is passed user_data when launched by the runtime.
void shmem_parallel_for_nbi( void (* body )( int , void *),
void *user_data , int lower_bound , int uppenbound );
shmem_parallel_for_nbi provides a one-dimensional parallel loop construct for AsyncSHMEM programs, where the bounds of the parallel loop are defined by lower_bound and upper_bound. Each iteration of the parallel loop executes body and is passed both its iteration index and user_data.
void shmem_task_scope_begin () ; void shmemtasLscop^end () ;
A pair of shmem_task_scope_begin and shmem_task_scope_end calls are analogous to a finish scope in the Habanero task parallel programming model. shmem_task_scope_end blocks until all transitively spawned child tasks since the last shmem_task_scope_begin have completed.
void shmem_task_nbi_when( void (* body )( void *), void * usendata , TYPE *ivar , int cmp, TYPE cm^value);
The existing OpenSHMEM Wait APIs allow an OpenSHMEM PE to block and wait for a value in the symmetric heap to meet some condition.
The shmem_task_nbi_when API is similar, but rather than blocking makes the execution of an asynchronous task predicated on a condition. This is similar to the concept of promises and futures introduced in Sect. 2. This API also allows remote communication to create local work on a PE.