Galileo reconfiguration and upgrade

  1. /
  2. HPC Center news
  3. /
  4. Galileo reconfiguration and upgrade

Dear Users,
It is a pleasure for us to announce a further upgrade of our Tier-1 HPC system (GALILEO) that will lead to a new larger cluster providing support to national and European research: about 1 thousand Broadwell nodes connected with a new Intel Omni-Path internal network. We expect to put the new system into production in about 1-2 months, with minor production interruptions during the course of the operations. User data in $HOME and $WORK will be copied to the new storage apparatus. Data in $CINECA_SCRATCH will not be transferred.
The reconfiguration phase starts today, and the upgrade will be organized in two subsequent steps:
– in the first one, new (Broadwell) nodes will be set-up on the new network, and production moved there
– in the second one, the present compute nodes will be upgraded and merged with the previous partition
 
During the first step the activities will go on using a reduced set of the present compute nodes; some increase in the waiting time of pending jobs needs to be accounted for, together with possible disservices on the login nodes.
In the second step – in a transparent way to users – the activities will switch to the nodes reconfigured in the first step.
Please note that a stop of about one month is required for the Tesla K80 GPUs used by the academic users: these nodes will be out of production during the month of July (the date of the stop will be announced in future communications).
The rest of the production will instead be stopped only for a few days of standard maintenance during the course of the operations.
 
Please also note the following remarks:
1) the environment on the new Galileo will remain the same (modules, scheduler configuration, etc.);
2) the data in the personal $HOME and project $WORK directories will be copied to the new storage apparatus (and finally synchronized in a few days stop between the two steps);
3) the data in $CINECA_SCRATCH will not be transferred, and new, empty scratch directories will be created. The old scratch areas will be accessible (in read-only mode) on the login nodes, so to allow users to copy all relevant data on the new storage or to archive it.
 
Additional information on the status of the operation will be regularly provided during the entire course of the operations. 
 
Best regards,
HPC User Support @ CINECA