The experiments have been conducted on TACC’s Stampede supercomputer. Stampede is currently the # 12 supercomputer on the top500 list  (as of June, 2016). Stampede contains 6400 dual-socket eight-core Sandy-Bridge E5- 2680 server nodes with 32 GB of memory, called “compute nodes”, and 16 quad- socket eight-core Sandy-Bridge E5-4650 server nodes at 2.7 GHz with 1 TB of memory, called “large memory nodes”. The nodes are interconnected by InfiniBand HCAs in FDR mode  and the operating system used is CentOS 6.4 with Linux kernel 2.6.32-431.el6. Experiments use the Lustre parallel filesystem version 2.5.5 on Stampede.
To do this evaluation, we use the Intel compiler version 18.104.22.168 on Stampede with the OpenSHMEM library. See  for a comparison of different Open- SHMEM implementations on Stampede. For the evaluation, we use a port of the NAS Parallel Benchmarks (NPB) to OpenSHMEM . The NAS Parallel Benchmarks for MPI are already well-documented and widely used as a benchmark [3,28,34]. It consists of a suite of parallel workloads designed to evaluate performance of various hardware and software components of a parallel computing system.