Training

The following courses on INTERTWinE-related topics contained material produced by project partners and / or were taught by project partners.

Upcoming training events

There are no more training courses taking place in the context of the INTERTWinE project, which finishes on 30 September 2018.

Past training events

Where available, links are provided to the course materials as a refresher for those who attended, or for anyone who would like to get an idea what is covered in the course. We would always recommend attending the courses in person to get a full understanding of the subject matter and the opportunity to ask for help or discuss any issues with the trainers. 

25 April 2016: GASPI Tutorial at EASC 2016 @ KTH (Stockholm, Sweden) - course given by T-Systems SfR

In this tutorial we present an asynchronous dataflow programming model for Partitioned Global Address Spaces (PGAS) as an alternative to the programming model of MPI. GASPI, which stands for Global Address Space Programming Interface, is a partitioned global address space (PGAS) API. The GASPI API is designed as a C/C++/Fortran library and focused on three key objectives: scalability, flexibility and fault tolerance. In order to achieve its much improved scaling behaviour GASPI aims at asynchronous dataflow with remote completion, rather than bulk-synchronous message exchanges. GASPI follows a single/multiple program multiple data (SPMD/MPMD) approach and offers a small, yet powerful API (see also http://www.gaspi.de and http://www.gpi-site.com). Hands-on sessions (in C and Fortran) will allow users to immediately test and understand the basic constructs of GASPI.

11-12 May 2016: Heterogeneous Programming on GPUs with MPI + OmpSs @ BSC (Barcelona, Spain)

The tutorial will demonstrate the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs.

More specifically, the tutorial will:

  • Introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems
  • Demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize:
    • MPI applications on clusters of SMPs, and
    • Leverage CUDA kernels with OmpSs on clusters of GPUs

6-7 June 2016: Runtime systems for heterogeneous platform programming @ Maison de la Simulation (Gif-sur-Yvette, France) - course given by INRIA

This course will present the state of the art of runtime system support for programming heterogeneous platforms. Heterogeneous computing platforms—such as multicores equipped with accelerators—are notoriously difficult to program due to the strong differences in performance characteristics among the various available computing units and also to the discrete memory spaces of accelerating boards.
The course will present the StarPU runtime system developed at Inria by the STORM Team in Bordeaux. It will also present the hardware locality library hwloc for discovering hardware resources and the TreeMatch framework (Bordeaux / Tadaam) for distributed processes placement, and the Ezperf framework for solving performance issues (SED Bordeaux).

6-7 June 2016: Efficient Parallel Programming with GASPI @ HLRS (Stuttgart, Germany) - course given by Fraunhofer ITWM

In this tutorial we present an asynchronous data flow programming model for Partitioned Global Address Spaces (PGAS) as an alternative to the programming model of MPI.
GASPI, which stands for Global Address Space Programming Interface, is a partitioned global address space (PGAS) API. The GASPI API is designed as a C/C++/Fortran library and focused on three key objectives: scalability, flexibility and fault tolerance. In order to achieve its much improved scaling behaviour GASPI aims at asynchronous dataflow with remote completion, rather than bulk-synchronous message exchanges. GASPI follows a single/multiple program multiple data (SPMD/MPMD) approach and offers a small, yet powerful API (see also http://www.gaspi.de and http://www.gpi-site.com). Hands-on sessions (in C and Fortran) will allow users to immediately test and understand the basic constructs of GASPI.

2-4 August 2016: Advanced OpenMP @ Cray UK Ltd EMEA Headquarters (Bristol, UK) - course led by EPCC

OpenMP is the industry standard for shared-memory programming, which enables serial programs to be parallelised using compiler directives.This course is aimed at programmers seeking to deepen their understanding of OpenMP and explore some of its more recent and advanced features.

This 3-day course will cover topics including nested parallelism, OpenMP tasks, the OpenMP memory model, performance tuning, hybrid OpenMP + MPI, OpenMP implementations, and new features in OpenMP 4.0. Hands-on practical programming exercises make up a significant, and integral, part of this course.

The course materials can be found at: http://www.archer.ac.uk/training/course-material/2016/08/160802_AdvOpenMP_Bristol/index.php

29-30 September 2016: Advanced MPI @ EPCC (Edinburgh, UK)

This course covers advanced MPI topics, including:
 * parallel collective I/O using MPI I/O functionality,
 * sparse communication topologies using MPI neighbourhood collective functionality,
 * and single-sided communication using MPI RMA functionality.

The course materials can be found at: http://www.archer.ac.uk/training/course-material/2016/09/160929_AdvMPI_EPCC/index.php

6-7 October 2016: Efficient Parallel Programming with GASPI @ Fraunhofer-Zentrum (Kaiserslautern, Germany)

The GASPI standard is developed and maintained as an open standard by the GASPI Forum. The HPC programmers of tomorrow will have to write codes, which are able to deal with systems hundreds of times larger than the top supercomputers of today. In this Tutorial we present an asynchronous dataflow programming model for Partitioned Global Address Spaces (PGAS). Interoperability with the current programming model standard MPI and threading models will be high-lighted during the course. However no previous knowledge of MPI is assumed. GASPI, which stands for Global Address Space Programming Interface, is a partitioned global address space (PGAS) API.

30 November 2016: Programming-model design and implementation for the Exascale @ ENEA (Rome, Italy) - course given by Inria

A two-hour tutorial on StarPU, as part of the 3rd Technical Meeting of EoCoE, the Energy-oriented Centre of Excellence for computer applications - http://www.eocoe.eu/.  Registration by invitation, or EoCoE members may register by following the link on this webpage: http://ict.enea.it/eocoe/rome-meeting.

2-3 March 2017: GASPI - Global Address Space Programming Interface @ IT4I (Ostrava, Czech Republic) - course given by T-Systems SfR

In this tutorial we present an asynchronous data flow programming model for Partitioned Global Address Spaces (PGAS) as an alternative to the programming model of MPI. GASPI (Global Address Space Programming Interface) is a PGAS API which is designed as a C/C++/Fortran library and focused on three key objectives: scalability, flexibility and fault tolerance. In order to achieve its much improved scaling behaviour GASPI aims at asynchronous dataflow with remote completion, rather than bulk-synchronous message exchanges. GASPI follows a single/multiple program multiple data (SPMD/MPMD) approach and offers a small, yet powerful API (see also http://www.gaspi.de and http://www.gpi-site.com).  GASPI is successfully used in academic and industrial simulation applications. The GASPI API works especially well for complex network topologies like the 7D enhanced hypercube of Salomon. (See e.g. https://github.com/PGAS-community-benchmarks/Pipelined-Transpose/wiki).  Hands-on sessions (in C and Fortran) will allow users to immediately test and understand the basic constructs of GASPI.

For further information, please see http://prace.it4i.cz/GASPI-03-2017

27-28 March 2017: Single-sided PGAS Communication Libraries @ University of Warwick—course given by the EPCC ARCHER CSE team and the GASPI development team

In some applications, the overheads associated with the fundamentally two-sided (send and receive) nature of MPI message-passing can adversely affect performance. This is of particular concern for scaling up to extremely large systems. There is a renewed interest in simpler single-sided communications models where data can be written/read directly to/from remote processes. This two-day course covers two single-sided Partitioned Global Address Space (PGAS) libraries: OpenSHMEM http://www.openshmem.org/ on day 1, and GASPI http://www.gaspi.de/ on day 2.

Hands-on practical sessions will play a central part in the course, illustrating key issues such as the need for appropriate synchronisation to ensure program correctness. All the exercises can be undertaken on ARCHER using C, C++ or Fortran. The OpenSHMEM material will be delivered by the ARCHER CSE team; the GASPI material will be delivered by members of the GASPI development team.

The course materials can be found at: http://www.archer.ac.uk/training/course-material/2017/03/PGAS_Warwick/

26-30 June 2017: PUMPS Summer School @ BSC (Barcelona, Spain) - course given by BSC

The eighth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers, with cutting-edge techniques and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators. The final day of this week-long summer school focused on OmpSs.

3-4 July 2017: Efficient Parallel Progamming with GASPI @ HLRS (Stuttgart, Germany)- course given by T-Systems SfR and Fraunhofer ITWM

In this tutorial we present an asynchronous data flow programming model for Partitioned Global Address Spaces (PGAS) as an alternative to the programming model of MPI. GASPI, which stands for Global Address Space Programming Interface, is a partitioned global address space (PGAS) API. The GASPI API is designed as a C/C++/Fortran library and focused on three key objectives: scalability, flexibility and fault tolerance. In order to achieve its much improved scaling behaviour GASPI aims at asynchronous dataflow with remote completion, rather than bulk-synchronous message exchanges. GASPI follows a single/multiple program multiple data (SPMD/MPMD) approach and offers a small, yet powerful API (see also http://www.gaspi.de and http://www.gpi-site.com). GASPI is successfully used in academic and industrial simulation applications.

Hands-on sessions (in C and Fortran) will allow users to immediately test and understand the basic constructs of GASPI. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves.

For further details, please see http://www.hlrs.de/training/2017-07-03-gaspi/.

12-13 September 2017: Advanced MPI @ University of Cambridge (Cambridge, UK) - course given by EPCC

This 2-day course is aimed at programmers seeking to deepen their understanding of MPI and explore some of its more recent and advanced features. We cover topics including communicator management, non-blocking and neighbourhood collectives, MPI-IO, single-sided MPI and the new MPI memory model. We also look at performance aspects such as which MPI routines to use for scalability, overlapping communication and calculation and MPI internal implementation issues.  Timetable will be available nearer the course date.

The course materials can be found at: http://www.archer.ac.uk/training/course-material/2017/09/advmpi-camb/index.php

23-27 October 2017: Parallel Programming Workshop (including one full day on OmpSs) @ BSC (Barcelona, Spain)

The objectives of this course are to understand the fundamental concepts supporting message-passing and shared memory programming models. The course covers the two widely used programming models: MPI for the distributed-memory environments, and OpenMP for the shared-memory architectures. It also presents the main tools developed at BSC to get information and analyze the execution of parallel applications, Paraver and Extrae. Moreover it sets the basic foundations related with task decomposition and parallelization inhibitors, using a tool to analyze potential parallelism and dependences, Tareador.

Additionally, it presents the Parallware compiler, which is able to automatically parallelize a large number of program structures, and provide hints to the programmer with respect to how to change the code to improve parallelization. It deals with debugging alternatives, including the use of GDB and Totalview. The use of OpenMP in conjunction with MPI to better exploit the shared-memory capabilities of current compute nodes in clustered architectures is also considered. Paraver will be used along the course as the tool to understand the behavior and performance of parallelized codes.

12-14 December 2017: Advanced OpenMP @ Imperial College (London, UK) - course given by EPCC

OpenMP is the industry standard for shared-memory programming, which enables serial programs to be parallelised using compiler directives.This course is aimed at programmers seeking to deepen their understanding of OpenMP and explore some of its more recent and advanced features.

This 3-day course will cover topics including nested parallelism, OpenMP tasks, the OpenMP memory model, performance tuning, hybrid OpenMP + MPI, OpenMP implementations, and new features in OpenMP 4.0. Hands-on practical programming exercises make up a significant, and integral, part of this course.

Attendees should be familiar with the basics of OpenMP, including parallel regions, data scoping, work sharing directives and synchronisation constructs. Access will be given to appropriate hardware for all the exercises, although many of them can also be performed on a standard Linux laptop.

The course materials can be found at: http://www.archer.ac.uk/training/course-material/2017/12/advOpenMP-imperial/index.php.

9-10 May 2018: Heterogeneous Programming on GPUs with MPI & OmpSs @ BSC (Barcelona, Spain)

This tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs. More specifically, the tutorial will:

  • Introduce the hybrid MPI/OmpSs parallel programming model for future exascale systems
  • Demonstrate how to use MPI/OmpSs to incrementally parallelize/optimize:
    • MPI applications on clusters of SMPs, and
    • Leverage CUDA kernels with OmpSs on clusters of GPUs

2-3 July 2018: Concepts of GASPI and interoperability with other communication APIs @ HLRS (Stuttgart, Germany) - course given by T-Systems SfR and Fraunhofer ITWM

In this tutorial we present an asynchronous data flow programming model for Partitioned Global Address Spaces (PGAS) as an alternative to the programming model of MPI. GASPI, which stands for Global Address Space Programming Interface, is a partitioned global address space (PGAS) API. The GASPI API is designed as a C/C++/Fortran library and focused on three key objectives: scalability, flexibility and fault tolerance. In order to achieve its much improved scaling behaviour GASPI aims at asynchronous dataflow with remote completion, rather than bulk-synchronous message exchanges. GASPI follows a single/multiple program multiple data (SPMD/MPMD) approach and offers a small, yet powerful API (see also http://www.gaspi.de and http://www.gpi-site.com). GASPI is successfully used in academic and industrial simulation applications. Hands-on sessions (in C and Fortran) will allow users to immediately test and understand the basic constructs of GASPI.

16 July 2018: StarPU: A Task-Based Runtime System for Heterogeneous Platform Programming @ Lab'O (Orléans, France) - tutorial given by INRIA

Heterogeneous computing platforms—such as multicores equipped with accelerators—are notoriously difficult to program due to the strong differences in performance characteristics among the various available computing units and also to the discrete memory spaces of accelerating boards. This 4-hour tutorial at the International Conference on High Performance Computing and Simulation (HPCS 2018) will introduce task-based programming, as a way to tame the complexity of heterogeneous architectures. It will present the StarPU runtime system developed at Inria by the STORM Team.

More information at: http://hpcs2018.cisedu.info/4-program/tutorials-hpcs2018

16-20 July 2018: PUMPS+AI Summer School (Programming and Tuning Massively Parallel Systems + Artificial Intelligence) @ BSC (Barcelona, Spain)

The ninth edition of the Programming and Tuning Massively Parallel Systems + Artificial Intelligence summer school (PUMPS+AI) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators. The final day of this week-long summer school will focus exclusively on OmpSs - a parallel programming model focused on exploiting task-based parallelism for applications written in C, C++ or Fortran, and one of the six key programming APIs on which INTERTWinE is focused.

For further information, including registration details, please see: https://pumps.bsc.es/2018/.

17-19 July 2018: Advanced OpenMP @ University of Cambridge (Cambridge, UK) - course given by EPCC

OpenMP is the industry standard for shared-memory programming, which enables serial programs to be parallelised using compiler directives.This course is aimed at programmers seeking to deepen their understanding of OpenMP and explore some of its more recent and advanced features.

This 3-day course will cover topics including nested parallelism, OpenMP tasks, the OpenMP memory model, performance tuning, hybrid OpenMP + MPI, OpenMP implementations, and new features in OpenMP 4.0/4.5. Hands-on practical programming exercises make up a significant, and integral, part of this course.

Attendees should be familiar with the basics of OpenMP, including parallel regions, data scoping, work sharing directives and synchronisation constructs. Access will be given to appropriate hardware for all the exercises, although many of them can also be performed on a standard Linux laptop.

The course materials can be found at http://www.archer.ac.uk/training/course-material/2018/07/AdvOpenMP-camb/index.php.  

10-11 September 2018: PLASMA and MAGMA software libraries for numerical linear algebra @ VŠB - Technical University Ostrava (Ostrava, Czech Republic) - course given by the University of Manchester in conjunction with the Institute of Mathematics of the Czech Academy of Sciences, and the University of Tennessee

PLASMA (Parallel Linear Algebra Software for Multicore Architectures) and MAGMA (Matrix Algebra on GPU and Multicore Architectures) are software libraries for numerical linear algebra, focusing mainly on dense matrices. The libraries offer functions for solving systems of linear equations with symmetric positive definite as well as general square matrices. Also included are linear least squares solvers, eigenvalue computations, and singular value decomposition.

The main purpose of PLASMA is to address the shortcomings of the widely used LAPACK when running on multicore processors, multi-socket systems of multicore processors, and manycore processors with shared memory. On the other hand, MAGMA aims at utilizing off-load oriented accelerators, especially GPUs. The libraries support both real and complex arithmetic in single and double floating-point precision.

The first day of the tutorial will focus on the PLASMA library. We will explain the concepts of tile data layout and tile algorithms, and the connection to task-based programming in the recent OpenMP standard. The basics of asynchronous execution driven by a directed acyclic graph (DAG) of computational tasks will be also explained.

The second day of the tutorial will be devoted to the MAGMA library, its functional scope as it differs from either LAPACK or PLASMA. Some basic concepts of GPU hardware, interaction, and execution will also be covered. This will provide information on conceptual and design differences between CPUs and GPUs and their memory storage hierarchy in the context of numerical linear algebra routines.

Further information, including registration details, at: http://prace.it4i.cz/en/PlasMagma-09-2018.

23 September 2018 (afternoon): “MPI + Y” – interoperable APIs for maximising asynchrony @ BSC (Barcelona, Spain) - tutorial given by BSC and T-Systems SfR

This tutorial at the EuroMPI conference features both an introduction and a hands-on session for two topics: the Task-Aware MPI (TAMPI) library and a recent GASPI extension which leverages the concept of shared windows. The TAMPI library provides a new MPI_TASK_MULTIPLE threading level which facilitates the development of hybrid MPI+OpenMP/OmpSs-2 applications. With the MPI_TASK_MULTIPLE any task can invoke synchronous MPI calls without blocking the underlying hardware thread, thus avoiding potential dead-locks. The GASPI extension (a SHAred Notifications (SHAN) communication library) primarily is aimed at migrating flat MPI legacy codes towards an asynchronous execution model. In order to achieve this goal the SHAN library makes use of notified communication, both within and across shared memory nodes. The SHAN API publishes solver data as well as corresponding datatypes in shared memory.  Ît also leverages one-sided notified GASPI communication (for non-local communication with other nodes) in order to pipeline packing and unpacking of the published datatypes with communication.  

The hands-on will focus on the migration from flat MPI legacy code towards both hybrid MPI + OmpSs-2 and the GASPI SHAN extension.

Further information, including registration details, at: https://eurompi2018.bsc.es/tutorials.

Last updated: 27 Sep 2018 at 11:16