Our goal is to provide performance data without relying on any special features of a particular OpenSHMEM implementation, i.e. PSHMEM. At a high level this involves two steps: constructing functionality similar to what is provided by the PSHMEM interface and making it available to the application.
For every routine in the OpenSHMEM standard, PSHMEM provides an analogous routine with a slightly different name. We use symbol wrapping via the program linker to do the same. Nearly all program linkers support a -wrap foosym command line option to enable wrapping of the symbol foosym. Any undefined reference to foosym will be resolved to __wrap_foosym and any undefined reference to __real_foosym will be resolved to foosym. In this case, we use symbol wrapping to provide a unique wrapper function for each API function defined in an OpenSHMEM implementation’s header files. When the application’s object files are linked to form the executable file, a -wrap flag for every OpenSHMEM API call is passed to the linker via the special @argfile syntax supported by most linkers.
Fig. 1. Symbol wrapping via the program linker replacing a call to shmem_int_put with a wrapper function at link time. The wrapper function uses TAU to record performance data and invokes the original shmem_int_put.
Figure 1 demonstrates symbol wrapping with an OpenSHMEM application that is statically linked against the OpenSHMEM implementation library libopenshmem.a. At link time, the call to shmem_int_put in the application is replaced with a call to __wrap_shmem_int_put, which is implemented in the libTau-shmem-wrap.a wrapper library. The wrapper function uses TAU to record performance data and invokes __real_shmem_int_put, which the linker replaces with a call to the original shmem_int_put as defined in libopenshmem.a.
Symbol wrapping works equally well for statically linked applications and dynamically linked applications that statically link against the OpenSH- MEM implementation. However, applications that link dynamically against libopenshmem.so should use library preloading instead of symbol wrapping because symbol wrapping will only intercept SHMEM calls made from the application itself.