HPC programming

As in any other computer, an HPC can be used with sequential programming. This is the practice of writing computer programs executing one instruction after the other, but not instructions simultaneously in parallel, i.e. parallel programming.

Parallel programming

There are different ways of writing parallelized code, while in general there is only one way to write sequential code, generally as a logic sequence of steps.

openMP (multithreading)

A popular way of parallel programming is through writing sequential code and pointing at specific pieces of code that can be parallelized into threads (fork-join mechanism, see figure below from ADMIN magazine). A thread is an independent execution of code with its own allocated memory.

If threads vary in execution time, when they have to be joined together to collect data, some threads might have to wait for others, leading to loss of execution time. It is up to the programmer to best balance the distribution of threads to optimize execution times when possible.

Modern CPUs support openMP in a natural way, since they are usually multicore CPUs and each core can execute threads independently. OpenMP is available as an extension to the programming languages C and Fortran and is mostly used to parallelize for loops that constitute a time bottleneck for the software execution.

Link Description
Video course a video course (here the link to the first lesson, you will be able to find all the other lessons associated to that) held by ARCHER UK.
OpenMP Starter A starting guide for OpenMP
Wikitolearn course An OpenMP course from Wikitolearn
MIT course A course from MIT including also MPI usage (next section for more info about MP)

MPI (message passing interface)

MPI is used to distribute data to different processes, that otherwise could not access to such data (picture below, from LLNL).

MPI is considered a very hard language to learn, but this reputation is mostly due to the fact that the message passing is programmed explicitely.

Link Description
Video course a video course (here the link to the first lesson, you will be able to find all the other lessons associated to that) held by ARCHER UK.
MPI Starter A starting guide for OpenMP
PRACE course A prace course on the MOCC platform futurelearn

GPU programming

GPUs (graphical processing units) are computing accelerators that are used to boosts heavy linear algebra applications, such as deep learning. A GPU usually features a large number of special processing units that can make the computer code extremely parallelized (figure below from astrocomputing).

GPUfigure

AMD and Nvidia are the two main producers of GPUs, where the latter has dominated the market for a long time. The danish HPCs Type 1 and 2 feature various models of Nvidia graphic cards, while Type 5 (LUMI) has the latest AMD Instinct.

The distinction between AMD and Nvidia is mainly due to the fact that they are programmed with two different dialects, and softwares with dedicated multithreading on GPUs need to be coded specifically for the two brands of GPUs.

Nvidia CUDA

CUDA is a C++ dialect that has also various library for the most popular languages and packages (e.g. python, pytorch, MATLAB, ...).

Link Description
Nvidia developer training Nvidia developer trainings for CUDA programming
Book archive An archive of books for CUDA programming
Advanced books Some advanced books for coding with CUDA
pyCUDA Code in CUDA with python

AMD HIP

HIP is a dialect for AMD GPUs of recent introduction. It has the advantage of being able to be compiled for both AMD and Nvidia hardware. CUDA code can be converted to HIP code almost automatically with some extra adjustments by the programmer.

The LUMI HPC consortia has already organized a course for HIP coding and CUDA-to-HIP conversion. Check out the PRACE training page for repeated editions of the course.

Link Description
Video introduction 1 Video introduction to HIP
Video introduction 2 Video introduction to HIP
AMD programming guide Programming guide to HIP from the producer AMD
Revideret 21/05/21