Library Setup, Exit, Query Routines
The shmem_init routine retrieves or calculates the local processing element (PE) number (for shmem_my_pe) and number of PEs (for shmem_n_pes), configures the optimized hardware barrier or collective dissemination barrier arrays, obtains the SHMEM heap memory offset, and precalculates a few other addresses for improved runtime performance. The shmem_ptr routine can directly calculate remote memory locations using simple logical shift and bitwise operations.
Memory Management Routines
Memory management on the Epiphany processor is atypical. Each Epiphany-III core has a flat 32 KB local memory map from address 0x0000 to 0x7fff. Programs are typically loaded starting at 0x0100 if extremely constrained for memory, or 0x0400 if using the COPRTHR 2 interface. The stack pointer typically moves downward from the high address. Data used for the application, including the SHMEM data heap, begins directly after the program space. Figure 2 shows the typical memory layout of an Epiphany-III core using the COPRTHR 2 interface as it relates to the PGAS model. The static or global variables that are typically defined within the application appear below the free local memory address within the symmetric heap. They are still symmetrical across all Epiphany cores as the program binary is identical.
Due to the tight memory constraints, a more modern memory allocator was not addressed in this work. The basic memory management system calls brk and
Fig. 2. The PGAS memory model (left) and the equivalent typical memory layout on an Epiphany-III core (right)
sbrk are more suited for controlling the amount of memory allocated from the SHMEM data heap for each process element because there is no virtual address abstraction. Instead, there is a local base memory tracking pointer that stores the current free memory base address and incremented with each allocation. The memory management routines build on these calls, but care must be taken to adhere to the following rules:
- 1. shmem_free must be called in the reverse order of allocation if making subsequent allocations
- 2. shmem_realloc can only be used on the last (re)allocated pointer
- 3. shmem_align alignment must be a power of 2 greater than 8 (default is 8)
This is a pragmatic approach that we feel is reasonable and won’t even be noticed on most codes. Calling shmem_free moves the local base memory tracking pointer to the address in the function argument so most routines only need to call it once for the first allocated buffer in a series if freeing all memory. The shmem_realloc routine could be designed to copy the contents of the old buffer to the new buffer, however, this would waste the memory space in the original allocation (a precious commodity on the Epiphany architecture). Future developments with COPRTHR 2 may address these deficiencies by exporting the COPRTHR host-side memory management to the coprocessor threads.