Pitagora: Maintenance Completed and System Upgraded

  1. /
  2. HPC Center news
  3. /
  4. Pitagora: Maintenance Completed and...

We are pleased to inform you that the maintenance operations on Pitagora have been successfully completed in two of the three expected days, and the system is now fully back in production.

During the maintenance, and in agreement with the EuroFusion Operation Committee, several system software and firmware components of the cluster were upgraded to ensure compliance with the latest Lenovo EveryScale Best Recipe, verified by both CINECA and the EuroFusion HLST.

We also took advantage of the production interruption to update the software stack.

Below are the main user‑visible changes.

Internode GPUDirect RDMA – Issue Resolved (Booster Partition)

The known issue affecting UCX GPUDirect RDMA communications on the Booster partition has been resolved. The root cause was a firmware‑level incompatibility, which has been fixed through upgrades of the firmware on both compute nodes and network interface cards (NICs).

GPUDirect RDMA is now fully functional and enabled by default. No user action is required. Users who previously applied temporary workarounds (e.g. disabling GPUDirect RDMA) are advised to revert them to restore optimal performance. We warmly recommend to recompile your GPU codes with the new software stack in order to exploit the GPUDirect RDMA technology.

Software Stack Upgrade

The software stack on Pitagora has been updated using spack/0.22.2_6.1.

GCC stack:

The compiler and MPI versions remain gcc/12.3.0 and openmpi/4.1.6. The UCX component has been upgraded to version 1.20, provided via NVIDIA HPC‑X. Packages in the GCC/OpenMPI/CUDA stack have been rebuilt and renamed with the suffix -ucx1.20.
Example:
Old: hdf5/1.14.3--openmpi--4.1.6--gcc--12.3.0
New: hdf5/1.14.3--openmpi--4.1.6--gcc--12.3.0-ucx1.20

NVHPC stack:

Updated from versions nvhpc/24.5 and nvhpc/24.9 to nvhpc/25.11. The MPI implementation has been upgraded from hpcx‑mpi/2.20 to hpcx‑mpi/2.25.1
Example:
Old: hdf5/1.14.3--hpcx-mpi--2.20--nvhpc--24.5
New: hdf5/1.14.3--hpcx-mpi--2.25.1--nvhpc--25.11

Further details are available in the documentation.

Best regards,
HPC User Support @ CINECA