Hybridizing pure MPI applications with tasks - new guide
In this Best Practice Guide for parallel application developers we introduce how to develop hybrid applications using the MPI version as the starting point. The idea is then to incorporate a task-based programming model such as OpenMP Tasking. Among all the techniques presented in the guide, we also include the use of features that are not yet implemented in these two programming models, so we will use the OmpSs-2 model that already incorporates them as a representative for the task-based runtime system, and an MPI interposition library, that will simulate the implementation of the required services in MPI.
The content of this document is the result of the synthesis of several publications that have been presented in the context of the INTERTWinE project. In a first publication we present the TAMPI interposition library (Task Aware MPI), a library that allows the insertion of task scheduling/switching points in the communication services of the MPI library. These points will allow applications to exploit the ability of runtime systems to start the execution of another task while carrying out a sending/receiving service. By being able to keep busy all the working threads with the execution of another task, the communication and computation overlap technique can be implemented by the mere fact of annotating the code that computes the result of a given operation and the code that receives and sends the inputs or results, respectively, with a considerable performance improvement.
The second publication that we present in this best practice guide is a new methodology approach of parallel decomposition: the Hierarchical Domain Over-decomposition using Tasks (HDOT). This system tries to mimic the parallel decomposition that occurs through the MPI programming model using the tasking model provided by the task-based runtime system. For programmers of pure MPI applications, this decomposition should be much simpler and, by following the same pattern as the first level parallelization, it will allow a better matching of the different phases that compose the algorithm.
This Best Practice Guide describes the main paradigms of HPC application programming in cluster-based systems, including their main advantages and disadvantages; we then develop the algorithm methodology that mimics the behavior of parallel decomposition carried out by MPI (i.e., the Hierarchical Domain Over-decomposition with Tasking), and finally we show the use of the interposition library that allows a greater interaction between the message passing library and the task-based runtime system.
This Best Practice Guide also aims to be a complement to the guides previously published by INTERTWinE on hybrid programming based on the Message Passing Library (Best Practice Guide to Hybrid MPI + OpenMP Programming; Best Practice Guide for Writing MPI + OmpSs Interoperable Programs).