Logo Cineca Logo SCAI
MARCONI status
GALILEO status
DAVIDE status

You are here

EURORA Guide for Intel Xeon Phi (MIC) Usage

In this page:

 

Architecture

Intel Xeon Phi is the first Intel Many Integrated Core (Intel MIC) architecture product.  Each card consists of 60 physical cores (@1.1 Ghz)   and each core is able to handle up to 4 thread using hyperthreading. Each core has one Vector Processing Unit able to deliver for each clock cycle

  • 8 Fused Multiply and Add (FMA) floating point operations in double precision
  • 16 Fused Multiply and Add (FMA) floating point operations in single precision.

So the Phi has a peak performance of

  • 1056 GFlops in double precision
  • 2112 Gflops in single precision

Each Phi coprocessor has a RAM memory of 8 GB, and a peak bandwidth of 352 GB/s.

Compilation

The MPSS environment (Intel® Manycore Platform Software Stack) is available also on the front-end. Therefore, you do not need to be logged inside a compute node hosting the MIC cards to compile a code to run on MIC. Anyway you still have to set environment for mic:

module load intel (i.e. compiler suite)
module load mkl (if necessary – i.e. math libraries)
source $INTEL_HOME/bin/compilervars.sh intel64 (to set up the environment variables)

 

Now you can compile your code. Pay attention that, depending on the way you intend to run your code (offload or native), you have to follow different procedures:

1) For codes meant to be run with the MIC offload attributes, you have to add the proper pragmas in your source code and compile it as usual. For example, use the Intel C++ compiler on the "hello_offload.cpp" code:

icpc -openmp hello_offload.cpp -o exe-offload.x

 

2)  For MIC-native codes, you have to actually cross-compile by adding the –mmic flag. For example, use the Intel C++ compiler on the "hello_native.cpp" code:

icpc hello_native.cpp -openmp -mmic -o exe-native.x


Execution

Offload programs are executed directly on the MIC node, from an interactive batch session or even by a batch script (requesting MIC cards with the nmics parameter). Note that the sourcing of the compilervars.sh script is important for making the node see the MICs during execution.

Offload programs execution through an interactive batch session
qsub -A <account_name> -I -l select=1:ncpus=1:nmics=1 -q debug
qsub: waiting for job 31085.node129 to start
qsub: job 31085.node129 ready
module load intel
cd $CINECA_SCRATCH
source $INTEL_HOME/bin/compilervars.sh intel64
./exe-offload.x

 

Offload programs execution through a batch script
#!/bin/bash
#PBS -o job.out
#PBS -j eo
#PBS -l walltime=0:10:00
#PBS -l select=1:ncpus=1:nmics=1
#PBS -q debug
#PBS -A <my_account>
# 
module load intel
cd $CINECA_SCRATCH
source $INTEL_HOME/bin/compilervars.sh intel64
./exe-offload.x

 

MIC-native codes need to be executed inside the MIC card itself. In order to log into a MIC card you have to:

  • login to a MIC node with a PBS interactive session requesting at least 1 mic (nmics=1);
  • use the "qstat -f <job_id>" command in order to get the name of the specific MIC card assigned to you;
  • connect through ssh into the MIC card (in the example node018-mic1)
qsub -A <account_name> -I -l select=1:ncpus=1:nmics=1 -q debug
qsub: waiting for job 31085.node129 to start
qsub: job 31085.node129 ready
...
qstat -f 31085.node129
...
exec_vnode = (node018:mem=1048576kb:ncpus=1+node018-mic1:nmics=1)
...
ssh node018-mic1
$

 

 At this point you will be prompted in the home space of the MIC card you’ve logged into. Here, the usual environment variables are not set, therefore the module command won’t work and your scratch space (which is mounted on the MIC card) has to be indicated with the full path instead of $CINECA_SCRATCH.

For executing your native-MIC program, you need to set the LD_LIBRARY_PATH environment variable manually, by adding the path of the intel libraries specific for MIC execution:

export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic:${LD_LIBRARY_PATH}

 You may need to add also path for mkl and/or tbb (Intel® Thread Building Blocks) MIC libraries:

export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/mkl/lib/mic:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/tbb/lib/mic:${LD_LIBRARY_PATH}

When everything is ready, you can launch your code as usual.

cd /gpfs/scratch/userexternal/<myuser>
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/mkl/lib/mic:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/tbb/lib/mic:${LD_LIBRARY_PATH
}
./exe.native.x

 

MPI Compilation

In order to compile an application suited for MICs, you need the MPSS environment (Intel® Manycore Platform Software Stack) to be set

module load intel (i.e. compiler suite)
module load intelmpi (i.e. mpi library)
module load mkl (if necessary – i.e. math libraries)
source $INTEL_HOME/bin/compilervars.sh intel64 (to set up the environment variables)
export I_MPI_MIC=enable (to enable mpi on MIC)

 

Now you can compile your code. For MIC-native codes, you have to actually cross-compile by adding the –mmic flag. For program written in C use  "mpicc", for program written in Fortran you have to use the "mpifc" command

mpicc -O3 -mmic mpi_code.c
...
mpifc -O3 -mmic mpi_code.f

 

MPI Execution

MIC-native codes can be launched from MIC node, once you get  your MIC card  through qsub (in the example node018-mic1)

qsub -A <account_name> -I -l select=1:ncpus=1:nmics=1 -q debug
qsub: waiting for job 31085.node129 to start
qsub: job 31085.node129 ready
qstat -f 31085.node129
...
exec_vnode = (node018:mem=1048576kb:ncpus=1+node018-mic1:nmics=1).

...

When you know your MIC card (in the example node018-mic1) you can lanch your MPI program (in the example using 30 tasks). Before MPI on MIC must be enabled setting the I_MPI_MIC  environment variable

export I_MPI_MIC=enable
mpirun.mic  -host node018-mic1 -np 30  ./a.out

Attention: use only the "mpirun.mic" command, "mpiexec" doesn't work correctly

If you need pass some variables you have to use the -"genv" flag

export I_MPI_MIC=enable
mpirun.mic -genv I_MPI_DEBUG 0 -genv I_MPI_PIN 1 -host node018-mic0 -np 30  ./a.out

 

if you want to use two MIC cards (so you have to ask for two MICs by setting "nmics=2" in your qsub request) you can set the number of tasks per card via the -perhost command

export I_MPI_MIC=enable
mpirun.mic  -host node018-mic0,
node018-mic1 -perhost 15 -np 30  ./a.out

 Alternatively, you can edit a suitable hostfile

export I_MPI_MIC=enable
mpirun.mic
-machinefile hostfile -np 30  ./a.out
....
cat hostfile
node018-mic0
node018-mic1

 In multi-MICs applications, especially when a large number of MPI processes are used, you might need to also set the (otherwise automatic) selection of DAPL providers:

export I_MPI_MIC=enable
export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1,ofa-v2-scif0,ofa-v2-mlx4_0-1

The above setting also cures the warnings which might appear in the standard error file (reporting "librdmacm: couldn't read ABI version...").

 Hybrid (OpenMP-MPI) Execution

You can compile your MIC-native codes as shown before using mpicc and -openmp and -mmic flags.

mpicc -O3 -openmp -mmic hyb_code.c
...
mpifc -O3 -openmp -mmic hyb_code.f

 

And then launch your code, using batch script as shown before, with mpi task distribution between MIC and exporting all environment variables nedeed.

...
export I_MPI_MIC=enable
mpirun.mic  -host node018-mic0,
node018-mic1 -perhost 1 -np 2
-genv LD_LIBRARY_PATH=/cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic/ 

-genv OMP_NUM_THREADS 120
./a.out

In this example each MIC has one mpi task, each of them present 120 different threads.

 

Some examples

Here you'll find some example, together with source code for native mode (OpenMP, MPI, Hybrid parallelization) on MIC.

Accounting

At present the use of the MICs and other accelerators is not accounted, only the time spent on the cpus is considered.

More details about "Accounting" can be found in the UserGuide (http://www.hpc.cineca.it/content/accounting-0).

H


Tofixpiexec doesn't worortran does't work