Auto-tuning of the FFTW Library for Massively Parallel Supercomputers

  1. /
  2. News
  3. /
  4. Auto-tuning of the FFTW...

Massimiliano Guarrasi (Cineca), Giovanni Erbacci (Cineca) and Andrew Emerson (Cineca)

In this paper we present the work carried out by CINECA in the framework of the PRACE-2IP project which had the aim of improving the performance of the FFTW library by refining the auto-tuning mechanism that is already implemented in this library. This optimization was realized with the following activities:

  • Identification of the major bottlenecks present in the current FFTW implementation;
  • Investigation of the auto-tuning mechanism provided in FFTW in order to understand how performance is affected by domain decomposition;
  • Introduction of a new parallel domain decomposition;
  • Construction of a library to improve the performance of the auto-tuning mechanism.

In particular, we have compared the performance of the standard Slab Decomposition algorithm already present with that obtained using the 2D Domain Decomposition and we found that on massively parallel supercomputers the performance of this new algorithm is significantly higher.

Read more: PRACE whitepaper