The INTERTWinE team works with real software and popular programming techniques, to ensure the focus is aligned to scientists’ pressing needs and applications.
The goal of the Co-Design applications is to provide a set of applications/kernels and design benchmarks to permit the exploration of interoperability issues. The applications evaluate the enhancements to programming model APIs and runtime implementations and provide feedback on their experience with the Resource Manager and Directory Cache service.
The Ludwig application simulates complex fluid mixtures. To study the interoperability between GASPI and MPI, a halo exchange, which is required at each time-step of the simulation, has been ported to GASPI. The original version of Ludwig uses MPI data types which are not supported by GASPI, which works on segments of data. This means that data used by MPI data types has to be unpacked and then copied into a continuous data segment. The not-optimised asynchronous and one-sided communication of GASPI performs at the same level as optimised MPI for large messages. From the studies, the recommendation arises that GASPI segments could look into the transparent handling of MPI data types. Ludwig is also studying MPI plus OpenMP thread multiple combinations which perform up to two times better over MPI plus OpenMP thread single. This is due to the fact that communication sections can be performed by different OpenMP threads.
The Barcelona Application Repository contains a set of kernels, e.g. the Cholesky factorisation, matrix multiplication, the heat and N-body benchmark based on the OmpSs programming model. The N-body simulation numerically approximates the evolution of a system of bodies in which each body continuously interacts with every other body. The communication pattern exchanges each process's particles in a half duplex ring pattern. The particles can be sent and received in blocks of N messages. Interoperability experiments with MPI and OpenMP show that performance is decreased due to a lack of resource management of the underlying CPU resources. The INTERTWinE resource manager will be evaluated for the N-body simulation.
PLASMA and DPLASMA are modern parallel libraries for numerical linear algebra with dense matrices. While PLASMA is aimed at shared memory architectures, DPLASMA extends these concepts to distributed memory environments. In the scope of INTERTWinE, we took part in converting PLASMA from its own runtime system; QUARK, to the OpenMP task parallelism. DPLASMA relies on the PaRSEC runtime system, using MPI message passing internally. As both numerical libraries can be used from parallel applications, it is crucial for them to maintain smooth interoperability with MPI, OpenMP, OmpSs, StarPU and other parallel programming APIs.
iPIC3D is a C++ MPI plus OpenMP work-sharing particle-in-cell (PIC) application for the simulation of space and fusion plasmas during the interaction between the solar wind and the Earth magnetic field. Charged particles from the solar wind are expressed as computational particles while solving the particles equation of motion. Magnetic and electric fields are handled by solving the Maxwell equations. Particles and fields interact through interpolations. In particular, the iPIC3D field code profits from a multi-threaded MPI approach. It remains to be studied why this approach gives a slightly lower performance with OpenMP tasks.
The CFD solver for aeronautics is a hybrid unstructured solver for compressible flow based on the solution of the Navier-Stokes equations. Next-generation implicit methods are being investigated for a new flow solver which will work in a multi-threaded fashion within single domains and can use either MPI or GASPI for the network communication. INTERTWinE’s ambition is to evaluate node-local scalability of implicit methods and the potential of using task based programming models. It is anticipated that the currently employed flat threading model of this next generation CFD solver will not be suitable for the upcoming next generation systems with deep and fragmented memory hierarchies. Task graph models like OmpSs or StarPU will be a very good match for these architectures. Due to its focus on asynchronous one-side dataflow notifications the GASPI API will be an excellent match for this anticipated global extension of the task graph model of OmpSs.
Computation with large-scale graphs (combinatorial computing) is crucial for Big Data analytics. While graph computations are often a source of poorly scalable parallel algorithms, due to their irregular nature and low computational intensity, many graph operations exhibit ample coarse-grained parallelism, which can be uncovered by exploiting the duality between graphs and sparse matrices. Currently the scale of the logical partitions to run with OmpSs are optimised. In the case of the usage of OmpSs plus multi-threaded MKL, oversubscription due to a bad mapping of threads to cores is seen. INTERTWinE plans to leverage the oversubscription by using the Resource Manager.
This document presents a first plan of the applications/kernels in combination with their initial evaluation and benchmark suite releases for use by WP3 and WP4. These initial releases of applications/kernels will be key to evaluating new or enhanced interoperability features that are made available in the respective runtime implementations. For each application/kernel, the document gives a brief introduction, describes the current state for the use case from the point of view of API combination, confirms the ambition plans with respect to API combination within the scope of INTERTWinE, and reviews the benchmarks and evaluation plans. There is also a summary of software (benchmark) release plans, focusing on the API combinations.