Inference performance of Llama 3.1 8B using vLLM across various GPUs and CPUs

Introduction Following our previous evaluation of Llama 3.1 8B inference performance on Azure’s ND-H100-v5 infrastructure using vLLM, this report broadens the scope to compare inference performance across a range of GPU and CPU platforms. Using the Hugging Face inference benchmarker, we assess not only throughput and latency but also the…

27/08/2025Azure High Performance Computing (HPC) Blog

Share:

You may be interested in

Slurm custom image for a locked down environment and faster start-up time, Azure Cyclecloud
Azure High Performance Computing (HPC) Blog,
Environment : Cyclecloud: 8.7.1 Slurm project 3.0.11 Slurm version: 23.11.10-2 OS of compute and execute: marketplace Almalinux HPC image gen 2 8.10 Prerequisites: - working CC install (mine is currently…
23/05/2025
Experience Next-Gen HPC Innovation: AMD Lab Empowers ‘Try Before You Buy’ on Azure
Azure High Performance Computing (HPC) Blog,
In today’s fast-paced digital landscape, High-Performance Computing (HPC) is a critical engine powering innovation across industries—from automotive and aerospace to energy and manufacturing. To keep pace with escalating performance demands…
11/03/2025
Azure Managed Lustre: not your grandparents’ parallel file system
Azure High Performance Computing (HPC) Blog,
Lustre has long been the gold standard for extreme performance and scalability amongst parallel file systems used in HPC. The open-source community has continually improved Lustre features and performance to…
03/08/2023
Integrating external Grid Engine Scheduler to CycleCloud (Cloud Bursting scenario)
Azure High Performance Computing (HPC) Blog,
Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High-Performance Computing (HPC) environments on Azure. With CycleCloud, users can provision infrastructure for HPC systems, deploy familiar HPC schedulers, and automatically scale…
13/03/2023
Bringing Generative AI to Semiconductor and Electronics Design
Azure High Performance Computing (HPC) Blog,
Generative AI promises to revolutionize how we live, work and play. This, as with prior technology innovations, have all been developed on increasingly complex, high-performance semiconductor platforms and systems. As…
15/11/2023
How to unmount Azure Managed Lustre filesystem using Azure Scheduled Events
Azure High Performance Computing (HPC) Blog,
Azure Managed Lustre delivers the time-tested Lustre file system as a first-party managed service on Azure. Long-time users of Lustre on-premises can now leverage the benefits of a complete HPC solution,…
04/09/2023
Deploying Open OnDemand Portal with Azure CycleCloud
Azure High Performance Computing (HPC) Blog,
Dr. Wolfgang De Salvador - EMEA GBB HPC/AI Infrastructure Senior Specialist Dr. Darko Mocelj - EMEA GBB HPC/AI Infrastructure Senior Specialist Resources and references used in this article: Repository…
09/05/2024
Comprehensive Nvidia GPU Monitoring for Azure N-Series VMs Using Telegraf with Azure Monitor
Azure High Performance Computing (HPC) Blog,
In today’s AI and HPC landscapes, GPU monitoring has become essential due to the complexity and high resource demands of these workloads. Effective monitoring ensures that GPUs are utilized optimally,…
30/09/2024
HPC on Redhat Linux images
Azure High Performance Computing (HPC) Blog,
While working with enterprise HPC teams, one of the often-heard questions is: how do we use InfiniBand or GPUs on Azure with Redhat Linux? Since Redhat does not provide hpc-enabled…
17/06/2025
Ansys Minerva Simulation & Process Data Management Architecture on Azure
Azure High Performance Computing (HPC) Blog,
Architecture Ansys Minerva baseline architecture has four distributed tiers (client, web, enterprise, and resource) in a single Azure availability zone. Each tier aligns to function and communication flows between these…
30/07/2025
Slurm custom image for a locked down environemnt and faster start-up time, Azure Cyclecloud
Azure High Performance Computing (HPC) Blog,
Enviornment : Cyclecloud: 8.7.1 Slurm project 3.0.11 Slurm version: 23.11.10-2 OS of compute and execute: marketplace Almalinux HPC image gen 2 8.10 PREREQUISITES: - working CC install (mine is currently…
22/05/2025
Microsoft Discovery: The path to an agentic EDA environment
Azure High Performance Computing (HPC) Blog,
Generative AI has been the buzz across engineering, science and consumer applications, including EDA. It was the centerpiece of the keynotes at both SNUG and CadenceLive, and it will feature…
20/06/2025