Conclusion and Future Work

Porting the SHOC benchmarks to OpenSHMEM has demonstrated that teams and collective operations are beneficial to supporting hybrid programming with OpenSHMEM. To successfully port these codes to OpenSHMEM, we implemented team data types, team split, free, barrier, and translate pe. In addition, we implemented C++ template functions for team based gather, reduction sum, prefix scan, and broadcast.

The performance of our implementation layer was good enough for low core counts. More scalable implementations with lower level optimizations will be required for larger job sizes.

The next phase of porting these benchmarks will be to look at moving the SHMEM communication into the CUDA kernels using NVSHMEM.

References

  • 1. Nvidia nvlink high-speed interconnect. http://www.nvidia.com/object/nvlink. html
  • 2. Baker, M., Pophale, S., Vasnier, J.-C., Jin, H., Hernandez, O.: Hybrid programming using OpenSHMEM and OpenACC. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 74-89. Springer, Heidelberg (2014). doi:10.1007/978-3-319-05215-1_6
  • 3. ten Bruggencate, M.: Cray SHMEM update. In: OpenSHMEM Workshop, March 2014. http://www.csm.ornl.gov/workshops/openshmem2013/documents/ presentations _and_tutorials/tenBruggencate_Cray _SHMEM_Update.pdf
  • 4. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (shoc) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63-74. ACM (2010)
  • 5. Hanebutte, U.R., Dinan, J., Robichaux, J.: Toward an openshmem teams extension to enable topology-aware parallel programming. In: OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies: Second Workshop, OpenSHMEM 2015, Annapolis, MD, USA, 4-6 August 2015, vol. 9397, p. 195. Springer, Heidelberg (2015). Revised Selected Papers
  • 6. Jose, J., Kandalla, K., Zhang, J., Potluri, S., Panda, D.: Optimizing collective communication in openshmem. In: 7th International Conference on PGAS Programming Models, p. 185
  • 7. Knaak, D., Namashivayam, N.: Proposing OpenSHMEM extensions towards a future for hybrid programming and heterogeneous computing. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M.G. (eds.) OpenSHMEM 2014. LNCS, vol. 9397, pp. 53-68. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26428-8_4
  • 8. NVIDIA: GPUdirect (2015). https://developer.nvidia.com/gpudirect
  • 9. NVIDIA: GPUdirect RDMA (2015). http://docs.nvidia.com/cuda/

gpudirect-rdma

10. Potluri, S., Rossetti, D., Becker, D., Poole, D., Gorentla Venkata, M., Hernandez, O., Shamis, P., Lopez, M.G., Baker, M., Poole, W.: Exploring OpenSHMEM model to program GPU-based extreme-scale systems. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M.G. (eds.) OpenSHMEM 2014. LNCS, vol. 9397, pp. 18-35. Springer International Publishing, Cham (2015). doi:10.1007/978-3-319-26428-8_2

  • 11. Rossetti, D.: GPUDirect: integrating the GPU with a network interface. In: GPU Technology Conference (2015)
  • 12. Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro. 36(2), 34-46 (2016)
 
Source
< Prev   CONTENTS   Source   Next >