Shared Notifications (GASPI-SHAN)
Traditionally the GASPI programming model targets multi-threaded or task-based applications. In order to support a migration of legacy applications (with a flat MPI communication model) towards GASPI, we have extended the concept of shared MPI windows to a notified communication model in which the processes sharing a common window can make use of node-local notified communication. In addition the use of MPI derived data-types can now be handled without additional data copies.
GASPI-SHAN directly exposes application data and data types in shared memory instead of developing a shared memory implementation for GASPI (which would have implied additional copies in shared memory). While the former concept of a shared memory implementation has to some extent been available in MPI since MPI-3.0, the latter concept does not yet exist. We also have extended the notified communication of GASPI to shared memory in the form of SHAred memory Notifications (or in short: SHAN). Shared memory notifications are composed of a write fence in combination with an atomic increment. This allows for a very-fine grain mechanism which is suitable for mutual process dependencies in shared memory and nicely complements the remote notified communication mechanism of GASPI.
GASPI-SHAN communication: Only remote communication is routed through the network. Local communication is replaced by notifications in shared memory and direct access to the application data.
SHAN exposes the data types of the application in shared memory. While this allows neighboring node-local ranks to directly access data types and actual data, it moreover allows for a dynamic change of data type structures. Sizes of data elements, the number of elements and their offsets can be adjusted on the fly. For remote nodes, GASPI SHAN assembles a packet header which contains information about the size of the data elements and their number.
A GASPI-SHAN application directly reads data from neighboring ranks on the same node. Sending of data here is replaced by local shared notifications, which signals that data can be read by neighboring node-local ranks. Sending to remote ranks is performed by a double buffered one-sided notified GASPI write (write_notify). As all sending of data is one-sided, the problem of late receivers in 2-sided communication can be entirely avoided.
Receiving of node local data is replaced by first testing the validity of the above ‘can read’ notification and a subsequent conversion of the remote data types into a local data type. The receive will also (node-locally) trigger a second notification for ‘have read’. Receiving of remote data is handled through the testing for completion of remote notified GASPI communication. A received GASPI notification from other ranks guarantees that the associated communication buffer is locally available on the receiving side.
Last but not least, communication in SHAN requires a confirmation for send. The main reason here is that a process must not rewrite the data other ranks are reading before all neighboring ranks have completed the read process. Testing for a completed send in SHAN is replaced by testing for the ‘have read’ shared notification from all neighbors. For remote ranks the double buffered communication guarantees a race free operation in bidirectional communication: as a remote communication partner requires a message from a previous communication in order to send, the receive of a message from this communication partner implies that the send of the previous communication buffer is complete and that we hence can rewrite the corresponding send buffer.
Instead of implicit (via derived data types) or explicit packing/unpacking of communication data, an application can share information about node local data layout, structure and computational state with the help of shared notifications. In order to simplify the porting effort for existing legacy applications, we have developed a corresponding interface for SHAred Notifications (or in short: GASPI-SHAN). At its core GASPI-SHAN extends the notified communication of GASPI towards shared memory.
Initial discussions at the GASPI forum indicate that GASPI-SHAN will very likely become part of the GASPI standard library. The GASPI standard library will collect libraries based on GASPI which extend the standard for special use cases.
Our work on Fine-grained Task Completion and Notified Communication for Shared Windows (GASPI-NOCOS) has also been presented to the GASPI Forum.