Pitagora: changes in Openmpi modules for UCC setting
- /
- HPC Center news
- /
- Pitagora: changes in Openmpi...
Dear users,
asymmetric source/destination memory types (GPU/Host) for sender and receivers are not yet supported in the Unified Collective Communication (UCC) project. If your application relies on this feature, you may encounter UCC errors such as:
ucc_schedule_pipelined.c:211 UCC ERROR failed to initialize fragment for pipeline
allreduce_sra_knomial.c:234 TL_UCP ERROR failed to init pipelined schedule
allreduce.c:38 TL_UCP ERROR asymmetric src/dst memory types are not supported yet
mpool.c:55 UCX WARN object 0x3077c00 was not returned to mpool tl_ucp_req_mp
This is not a blocking issue, as OpenMPI automatically falls back to the next available native collective component. However, to avoid the intrusive appearance of these error messages, we have modified the following modules:
hpcx-mpi/2.25.1
openmpi/4.1.6–gcc–12.3.0-ucx1.20
openmpi/5.0.9–gcc–12.3.0-ucx1.20
by removing UCC as the default collective communication component.
We have also updated the documentation with a new section “MPI Advanced Configuration“. It contains further details on this topic, including instructions on how to enable UCC collective communication support in your jobs and override the new default module settings.
Since not all applications are affected by this issue, we recommend testing UCC—especially for GPU-based workloads—as it may provide performance benefits.
The new section of the documentation also contains a paragraph about enabling “NVIDIA Sharp” for network offloading of collective operations, that in certain cases may lead to a strong performance improvement. We highly recommend to read the new documentation and test the new settings yourself to see if you application can take advantage of these features.
Best regards,
HPC User Support – CINECA