Leonardo: Network Instabilities

  1. /
  2. HPC Center news
  3. /
  4. Leonardo: Network Instabilities

Dear users,

We regret to inform you that we are experiencing an increase in the compute nodes’ disconnection rate over the IB network, which is causing a higher probability of job hanging or failure — in particular for jobs involving a high number of nodes. Please note that since the network of the Leonardo Booster and DCGP partition is the same, the two partitions are equally affected.

We are monitoring the state of Leonardo’s network and working to mitigate the problem — we will keep you updated on this via HPC-News. In the meantime, we suggest you monitor your jobs to verify if they are hanging and cancel them if necessary.

We apologize for the inconvenience.

Kindest regards,
HPC User Support @ CINECA