Optimizing AI Workloads on Azure: CPU Pinning via NCCL Topology file

(Co-authored by: Rafael Salas, Sreevatsa Anantharamu, Jithin Jose) Introduction NCCL NVIDIA Collective Communications Library (NCCL) is one of the most widely used communication library for AI training/inference. It features GPU-focused collective and point-to-point communication designs that are vital for AI workloads. NUMA-GPU-HCA affinity Non-uniform memory access (NUMA) architecture splits up…

Learn More
Share:

You may be interested in

What you're searching for?

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors