Previous SHMEM Work with Teams, Collectives, and Hardware Accelerators

SHMEM Teams and Collectives. To overcome some difficulties experienced when using OpenSHMEM with Accelerators Knaak et al. [7] list a set of extensions along with microbenchmarks to test the extensions. For synchronization, we used put followed by wait many times in these collectives, so a put with signal, as proposed here, would be very useful.

As mentioned the Cray Message Passing Toolkit recently added flexible process group team operations [3] which is very similar to the API described in [7] and the one which we implement for this work. Hanebutte et al. [5] further propose federations as a way to extend teams with topologies that group pes within a team. Our work does not explore processor topologies.

Regarding collectives, some work has been done to optimize existing Open- SHMEM collectives by mapping them to MPI [6]. There is not other work that we are aware of that addresses implementing MPI collectives using SHMEM teams.

SHMEM with Hardware Accelerators and Hybrid Programming Models. Baker et al. [2] ported an MPI + OpenMP application to SHMEM + OpenACC. Since they started with an application that does not use hardware acceleration, most of the focus on optimizing the OpenACC for the NVidia hardware on the Cray XK7. For the SHMEM portion, they note the same patterns as we saw in SHOC, that accelerator code is limited by the need to have all communication outside of the accelerator kernels and that synchronization is required between kernel launches.

To address the problem of overheads in moving data between main memory and GPU, NVidia GPU direct technologies [8,9,11] allow data movement between nodes, but still require CPU involvement. The proposed model of NVSHMEM [10] moves SHMEM communication directly into the CUDA kernel and uses the GPU-GPU communication to move data between devices without requiring the program to split up communication and kernel code. This model will be explored in the next phase of SHOC benchmark porting.

 
Source
< Prev   CONTENTS   Source   Next >