MPI plus GASPI

This Resource Pack helps developers create efficient and effective, capability-scale applications by exploiting both GASPI and MPI for communication/ performance critical parts of codes.

This page has the following sections:

Motivation and Strategy

Both MPI and GASPI address distributed-memory systems. MPI (http://mpi-forum.org/) has been considered as the standard for writing parallel programs for distributed-memory systems for more than two decades. GASPI (http://www.gaspi.de/), the Global Address Space Programming Interface, is a modern specification of a compact API for the development of parallel applications, which aims at a paradigm shift from bulk-synchronous, two-sided communication patterns towards an asynchronous communication and execution model. GPI-2 (http://www.gpi-site.com/gpi2/) is the leading, open-source implementation of the GASPI standard.

GASPI can be introduced gradually into an existing MPI-based code for moving only certain communication duties to the GASPI. In this case, GASPI can inherit the environment from MPI, and the hybrid MPI+GASPI program is launched in the same way as the MPI code.

Strategy

The GASPI standard promotes the use of one-sided communication, where one side—the initiator—has all the relevant information (what, where from, where to, how much, etc.) for performing the data movement. The benefit of this is decoupling the data movement from the synchronization between processes. It enables the processes to put or get data from remote memory, without engaging the corresponding remote process, or having a synchronization point for every communication request. However, some form of synchronization is still needed in order to allow the remote process to be notified upon the completion of an operation.

GASPI provides so-called weak synchronization primitives which update a notification on the remote side. The notification semantics is complemented with routines that wait for the updating of a single or a set of notifications. GASPI allows for a thread-safe handling of notifications, providing an atomic function for resetting a local notification with a given ID (this returns the notification value before reset). The notification procedures are one-sided and only involve the local process.

Hence, there is a potential to enhance an application’s performance by shifting to one-sided communication as in GASPI. There are two possibilities for such shift:

  1. Rewriting large legacy MPI codes to use a different inter-node programming model is, in many cases, highly labour-intensive and, therefore, not appealing to developers.
  2. Replacing MPI with GASPI only in performance critical parts of those codes is an attractive solution from a practical perspective, but this requires both APIs to interoperate effectively and efficiently on sharing communication and on data management.

The INTERTWinE team had focused on the second approach, given that it is rarely possible to re-implement an application from scratch. We aim to study interoperability of GASPI and MPI in order to allow for incremental porting of applications, starting with communication/ performance critical parts of codes.

Industrial Relevance

Graph showing strong scalability of the GPI-based RTM applicationThe Message Passing Interface (MPI) has been considered the de-facto standard for writing parallel programs for clusters of computers for more than two decades. Although the API has become very powerful and rich, having passed through several major revisions, new alternative models that are taking into account modern hardware architectures have evolved in parallel. GASPI is such a model.

GASPI aims at extreme scalability, high flexibility, and failure tolerance for parallel computing environments. As there are no equivalent mechanisms available in MPI today, GASPI can demonstrate superior scalability for a substantial range of applications.

GASPI has achieved a good level of uptake in industry, in sectors such as seismic imaging (RTM and GRT); finite-element applications; and computational fluid dynamics.

Legacy MPI applications can be improved significantly by revising critical communication regions of the code to use GASPI single-sided primitives, dramatically reducing unnecessary synchronisation and, as a result, boosting scalability significantly. Vice versa an interoperable solution allows existing GASPI applications to make better use of existing MPI code, such as MPI libraries.

Best Practice Guide

Details on combining GASPI with MPI as well as a case study can be found in the Best Practice Guide on MPI + GASPI by INTERTWinE.

Tutorials

Learning the GPI-2 and GASPI API and programming model is a quick process, particularly if one already has some background on parallel programming. The GASPI tutorial was conceived for new users to learn how to compile and execute a GASPI program and to provide an overview of its major features. To access the tutorial, go to http://www.gpi-site.com/gpi2/tutorial.

Applications and Kernels

As GASPI decouples the data movement from the synchronization between processes, it is especially relevant in applications that rely on continuous halo communications between neighbours. We aim at reducing the synchronization between sub-domains by porting Ludwig’s and iPIC3D’s main halo exchange routines form MPI to GASPI. In addition, we replace the MPI reductions with the GASPI reductions in the iPIC3D linear solver. The current approach of coupling both APIs brings little benefits due to the unpacking the date from the MPI datatypes and, then, packing them to GASPI segments, leading to withhold some of the GASPI advantages. A new approach based on so-called shared notifications is under development. [iPIC3D] [Ludwig]

Software to support this release pack can be downloaded from GitHub,

Resource Pack

The INTERTWinE GASPI and MPI Resource Pack contains the following:

  1. INTERTWinE Best Practice Guide for programming with GASPI and MPI
  2. INTERTWinE developers' commentary on several real-world software applications, to illustrate good practice for MPI plus GASPI:
    1. Ludwig, a versatile code for the simulation of Lattice-Boltzmann (LB) models in 3D on cubic lattices [Guide, Source Code].
    2. iPIC3D, a Particle-in-Cell (PIC) code for the simulation of space plasmas in space weather applications during the interaction between the solar wind and the Earth’s magnetic field [Guide, Source Code]
For more details, please consult our deliverables:
  1. D5.4 Final report on application/ kernel plans, evaluations and benchmark suite releases
  2. D5.3 Performance evaluation report
  3. D5.2 Interim report on application/kernel plans, evaluations and benchmark suite releases
Last updated: 06 Nov 2018 at 17:22