Logo Cineca Logo SCAI
MARCONI status
GALILEO status
DAVIDE status

You are here

rsync command

The following examples show how to use rsync via command line and batch script to transfer from, to and between CINECA HPC machines.  The rsync parameters are tuned for CINECA internal network. 

For very large data set (>~ 500GB), CINECA'staff strongly suggests to use globus on line via GridFTP protocol

Fermi

Galileo/Pico

FERMI

rsync via command line

You can lunch rsync via command line in the following way:

#CINECA-FERMI <-> CINECA-FERMI (up to 10 min)
rsync --timeout=600 -r -avzHS --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> <data_path_to>

#CINECA-FERMI -> LOCAL/HPC machine
rsync --timeout=600 -r -avzHS --bwlimit=80000 --block-size=1048576 --progress
username@login.fermi.cineca.it:<data_path_from> .

#LOCAL/HPC machine -> CINECA-FERMI
rsync --timeout=600 -r -avzHS --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> username@login.fermi.cineca.it:<data_path_to>

Please note that, on CINECA's cluster, the maximum cpu time available via command line, is 10 min.

If your rsync connection will be killed after this time (i.e for big file >10 GB) and your transfer has not been completed ri-execute rsync command line in this way

rsync --timeout=600 -r -avzH --append --sparse --progress 
<data_path_from> <data_path_to>

Rsync will re-start the transfer by updating the chunck file on the destination (--append option). If necessary, repeat this rsync command line untill the data transfer will be completed.   

Alternatevely, you can use the script cinrsync  available in module superc. 

module load superc

cinrsync -h (for help on the usage).

The script cinrsync will re-launch automatically your rsync via command line several times untill the transfer is completed. 

If the SSH key authentication has not been set between the hosts, the password will be asked at every re-start of the rsync.  

For data transfers that require more than 10 minutes a good way is to launch rsync command via batch job.

rsync via batch job

Single step

 If your data copy requires up to 6 hours you can lunch rsync via batch file by setting a single step. This way allows you to have up to 6 hours of time limit for the data copy without consuming your budget. In fact, the job will run on the serial queue (login nodes).

Example

#CINECA-FERMI <-> CINECA-FERMI
#!/bin/bash
#
# @ output      = myjob.$(jobid).out
# @ error       = myjob.$(jobid).err
# @ wall_clock_limit = 06:00:00
# @ job_type    = serial
# @ class = serial
# @ queue
rsync --timeout=600 -r -avzHS --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> <data_path_to>

Multiple steps

 If your data copy requires more than 6 hours you can run a multisteps job. Each step of this job has up to 6 hours of time limit and will copy the data starting from the file where the previous step was interrupted.

Example

#CINECA-FERMI <-> CINECA-FERMI
#!/bin/bash
#
# @ error = myjob.$(jobid).$(stepid).err
# @ output = myjob.$(jobid).$(stepid).out
# @ wall_clock_limit = 06:00:00
# @ job_type    = serial
# @ class = serial
#
# @ step_name = step00
# @ job_type =serial
# @ class = serial
# @ queue
#
# @ step_name = step01
# @ dependency = step00 >= 0
# @ job_type =serial
# @ class = serial
# @ queue
#
# @ step_name = step02
# @ dependency = step01 >= 0
# @ job_type = serial
# @ class = serial
# @ queue

case $LOADL_STEP_NAME in
step00)
 rsync --timeout=600 -avHS -r --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> <data_path_to>
;;
step01)
 rsync --timeout=600 -avHS -r --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> <data_path_to>
;;
step02)
 rsync --timeout=600 -avHS -r --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> <data_path_to>
;;
esac

If you need to tranfer your data from Fermi to another HPC machine, via batch job, you have to modify the rsync command in this way :

#CINECA-FERMI -> HPC machine
rsync --timeout=600 -r -avzH --bwlimit=80000 --block-size=1048576 --sparse --progress
<data_path_from> username@hpcmachine_hostname:<data_path_to> (*)

(*) you have to add the public key of the Fermi username to "$HOME/.ssh/authorized_keys" file
of the target HPC machine


GALILEO/PICO


rsync via command line

 You can lunch rsync command from the command line in this way:

#CINECA-GALILEO/PICO <-> CINECA-GALILEO/PICO
rsync --timeout=600 -avzHS -r --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> <data_path_to>

#LOCAL/HPC machine -> CINECA-GALILEO
rsync --timeout=600 -avzHS -r --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> -e ssh username@login.galileo.cineca.it:<data_path_to>

#CINECA-GALILEO -> LOCAL/GALILEO machine
rsync --timeout=600 -avzHS -r --bwlimit=80000 --block-size=1048576 --progress
-e ssh username@login.galileo.cineca.it:<data_path_from> <data_path_to>

#LOCAL/HPC machine -> CINECA-PICO
rsync --timeout=600 -avzHS -r --bwlimit=80000 --block-size=1048576 --progress
<data_path_from> -e ssh username@login.pico.cineca.it:<data_path_to>

 #CINECA-PICO-> LOCAL/HPC machine
rsync --timeout=600 -avzHS -r --bwlimit=80000 --block-size=1048576 --progress
-e ssh username@login.pico.cineca.it:<data_path_from> <data_path_to>

If you have to transfer a large file (> 10 GB) you can lunch rsync via batch job:

 

 Rsync via batch job (CINECA GALILEO/PICO <-> CINECA GALILEO/PICO only)

 Single step

 If your data copy requires up to 4 hours you can lunch rsync via batch file by setting a single step. This way allows you to have up to 4 hours of time limit for the data copy without consuming your budget. In fact, the job will run on the archive queue (login nodes).

 Batch single step job example

#!/bin/bash
#PBS -l walltime=4:00:00
#PBS -l select=1:mpiprocs=1
## PBS -N myjob
#PBS -o rsync$job_id.out
#PBS -e rsync$job_id.err
#PBS -q archive
#### Load Modules
. /cineca/prod/environment/module/3.1.6/none/init/bash
module purge
## Move to directory from which you have submitted the PBS, i.e. the working dir
cd $PBS_O_WORKDIR
## Define the source and destionation folder

sourc=/gpfs/scratch/........ ## do not put the / here
dest=/shared/project/data/...... ## put the / here

### Launch rsync

rsync -avHS -r $sourc $dest > logrsync.out

 

Multiple steps job

 If your data copy requires more than 4 hours you can run a multisteps job. Each step of this job has up to 4 hours of time limit and will copy the data starting from the file where the previous step was interrupted.

  Batch multi step job example