Intel Xeon Phi is the first Intel Many Integrated Core (Intel MIC) architecture product. Each card consists of 60 physical cores (@1.1 Ghz) and each core is able to handle up to 4 thread using hyperthreading. Each core has one Vector Processing Unit able to deliver for each clock cycle
So the Phi has a peak performance of
Each Phi coprocessor has a RAM memory of 8 GB, and a peak bandwidth of 352 GB/s.
The MPSS environment (Intel® Manycore Platform Software Stack) is available also on the front-end. Therefore, you do not need to be logged inside a compute node hosting the MIC cards to compile a code to run on MIC. Anyway you still have to set environment for mic:
module load intel (i.e. compiler suite)
module load mkl (if necessary – i.e. math libraries)
source $INTEL_HOME/bin/compilervars.sh intel64 (to set up the environment variables)
Now you can compile your code. Pay attention that, depending on the way you intend to run your code (offload or native), you have to follow different procedures:
1) For codes meant to be run with the MIC offload attributes, you have to add the proper pragmas in your source code and compile it as usual. For example, use the Intel C++ compiler on the "hello_offload.cpp" code:
icpc -openmp hello_offload.cpp -o exe-offload.x
2) For MIC-native codes, you have to actually cross-compile by adding the –mmic flag. For example, use the Intel C++ compiler on the "hello_native.cpp" code:
icpc hello_native.cpp -openmp -mmic -o exe-native.x
Offload programs are executed directly on the MIC node, from an interactive batch session or even by a batch script (requesting MIC cards with the nmics parameter). Note that the sourcing of the compilervars.sh script is important for making the node see the MICs during execution.
qsub -A <account_name> -I -l select=1:ncpus=1:nmics=1 -q debug
qsub: waiting for job 31085.node129 to start
qsub: job 31085.node129 ready
module load intel
cd $CINECA_SCRATCH
source $INTEL_HOME/bin/compilervars.sh intel64
./exe-offload.x
#!/bin/bash
#PBS -o job.out
#PBS -j eo
#PBS -l walltime=0:10:00
#PBS -l select=1:ncpus=1:nmics=1
#PBS -q debug
#PBS -A <my_account>
#
module load intel
cd $CINECA_SCRATCH
source $INTEL_HOME/bin/compilervars.sh intel64
./exe-offload.x
MIC-native codes need to be executed inside the MIC card itself. In order to log into a MIC card you have to:
qsub -A <account_name> -I -l select=1:ncpus=1:nmics=1 -q debug
qsub: waiting for job 31085.node129 to start
qsub: job 31085.node129 ready
...
qstat -f 31085.node129
...
exec_vnode = (node018:mem=1048576kb:ncpus=1+node018-mic1:nmics=1)
...
ssh node018-mic1
$
At this point you will be prompted in the home space of the MIC card you’ve logged into. Here, the usual environment variables are not set, therefore the module command won’t work and your scratch space (which is mounted on the MIC card) has to be indicated with the full path instead of $CINECA_SCRATCH.
For executing your native-MIC program, you need to set the LD_LIBRARY_PATH environment variable manually, by adding the path of the intel libraries specific for MIC execution:
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic:${LD_LIBRARY_PATH}
You may need to add also path for mkl and/or tbb (Intel® Thread Building Blocks) MIC libraries:
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/mkl/lib/mic:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/tbb/lib/mic:${LD_LIBRARY_PATH}
When everything is ready, you can launch your code as usual.
cd /gpfs/scratch/userexternal/<myuser>
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/mkl/lib/mic:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/tbb/lib/mic:${LD_LIBRARY_PATH}
./exe.native.x
In order to compile an application suited for MICs, you need the MPSS environment (Intel® Manycore Platform Software Stack) to be set
module load intel (i.e. compiler suite)
module load intelmpi (i.e. mpi library)
module load mkl (if necessary – i.e. math libraries)
source $INTEL_HOME/bin/compilervars.sh intel64 (to set up the environment variables)
export I_MPI_MIC=enable (to enable mpi on MIC)
Now you can compile your code. For MIC-native codes, you have to actually cross-compile by adding the –mmic flag. For program written in C use "mpicc", for program written in Fortran you have to use the "mpifc" command
mpicc -O3 -mmic mpi_code.c
...
mpifc -O3 -mmic mpi_code.f
MIC-native codes can be launched from MIC node, once you get your MIC card through qsub (in the example node018-mic1)
qsub -A <account_name> -I -l select=1:ncpus=1:nmics=1 -q debug
qsub: waiting for job 31085.node129 to start
qsub: job 31085.node129 ready
qstat -f 31085.node129
...
exec_vnode = (node018:mem=1048576kb:ncpus=1+node018-mic1:nmics=1).
...
When you know your MIC card (in the example node018-mic1) you can lanch your MPI program (in the example using 30 tasks). Before MPI on MIC must be enabled setting the I_MPI_MIC environment variable
export I_MPI_MIC=enable
mpirun.mic -host node018-mic1 -np 30 ./a.out
Attention: use only the "mpirun.mic" command, "mpiexec" doesn't work correctly
If you need pass some variables you have to use the -"genv" flag
export I_MPI_MIC=enable
mpirun.mic -genv I_MPI_DEBUG 0 -genv I_MPI_PIN 1 -host node018-mic0 -np 30 ./a.out
if you want to use two MIC cards (so you have to ask for two MICs by setting "nmics=2" in your qsub request) you can set the number of tasks per card via the -perhost command
export I_MPI_MIC=enable
mpirun.mic -host node018-mic0,node018-mic1 -perhost 15 -np 30 ./a.out
Alternatively, you can edit a suitable hostfile
export I_MPI_MIC=enable
mpirun.mic -machinefile hostfile -np 30 ./a.out
....
cat hostfile
node018-mic0
node018-mic1
In multi-MICs applications, especially when a large number of MPI processes are used, you might need to also set the (otherwise automatic) selection of DAPL providers:
export I_MPI_MIC=enable
export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1,ofa-v2-scif0,ofa-v2-mlx4_0-1
The above setting also cures the warnings which might appear in the standard error file (reporting "librdmacm: couldn't read ABI version...").
You can compile your MIC-native codes as shown before using mpicc and -openmp and -mmic flags.
mpicc -O3 -openmp -mmic hyb_code.c
...
mpifc -O3 -openmp -mmic hyb_code.f
And then launch your code, using batch script as shown before, with mpi task distribution between MIC and exporting all environment variables nedeed.
...
export I_MPI_MIC=enable
mpirun.mic -host node018-mic0,node018-mic1 -perhost 1 -np 2
-genv LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic/
-genv OMP_NUM_THREADS 120 ./a.out
In this example each MIC has one mpi task, each of them present 120 different threads.
Here you'll find some example, together with source code for native mode (OpenMP, MPI, Hybrid parallelization) on MIC.
At present the use of the MICs and other accelerators is not accounted, only the time spent on the cpus is considered.
More details about "Accounting" can be found in the UserGuide (http://www.hpc.cineca.it/content/accounting-0).
H
Tofixpiexec doesn't worortran does't work