Massively parallel computing
Modern N -body simulations are built on a computational paradigm called distributed computing. The supercomputer is defined as an ensemble of independent computers with their own individual processing units, interconnected by a high-speed network. There is a fundamental difference with another paradigm called grid computing, in the sense that the interconnecting network (in short, the interconnect) is of very high quality, usually defined by two critical numbers defining a communication network, namely the latency and the bandwidth. Massively parallel systems have interconnects with low latency and high bandwidth. Most interconnects in the HPC community are designed using Infiniband cables, cards, and switches, three hardware elements that are required to connect an ensemble of computers.
From the software point of view, distributed computing requires that the computational domain be split into independent regions of space, in an operation called domain decomposition. Particles and grid cells are therefore also distributed across the machine, each individual computer corresponding to a single region of space. Gravity calculations are very challenging to perform within this context, because they are dealing with a nonlocal phenomenon. FFT-based techniques require that big matrices be transposed across the whole machine, making intensive use of the network and hitting the wall of the limited bandwidth. Tree-based calculations are more efficient, because the information coming from distant processors can be compressed using tree cells and high-order multipoles. The amount of information to be transferred across the network is therefore strongly reduced. Multigrid solvers are also very appealing in this context, since they formally require exchange only of the information coming from direct neighbours, limiting the information to be transferred to a small number of planes one cell thick.
Efficient parallel computations can be achieved using these domain decomposition techniques. A problem arises when these independent domains become too small, so that the amount of computations becomes smaller than the amount of communication. Moreover, it is very difficult in practice to make sure that the amount of work performed in each processor remain exactly the same across all processors. This problem has nothing to do with communication across the network; rather, it is due to a poor load balancing of the work across the machine. This requires a very careful scheduling of the various tasks required to compute the gravity, and this is currently the main limitation of highly resolved N -body simulations.