A new way to create and manage CycleCloud Slurm clusters on Azure Slurm is one of the most popular and widely used open-source workload managers for AI/HPC and cloud computing.…
Introduction This blog will utilize manual steps on for exporting data to blob storage in order to retain specific POSIX attributes. The exporting of data is achieved using the Lustre…
Industry-leading high-bandwidth memory (HBM) capacity and bandwidth targeting generative inferencing and AI training Artificial intelligence is transforming every industry and creating new opportunities for innovation and growth. On top…
Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High-Performance Computing (HPC) environments on Azure. With CycleCloud, users can provision infrastructure for HPC systems, deploy familiar HPC schedulers, and…
Dr. Wolfgang De Salvador - EMEA GBB HPC/AI Infrastructure Senior Specialist Dr. Darko Mocelj - EMEA GBB HPC/AI Infrastructure Senior Specialist Resources and references used in this article: Repository…
HPC AI Infrastructure Summit Discover how Microsoft, AMD, and your peers are revolutionizing their infrastructure with HPC+AI Click Here to Register Now to Secure Your Spot! Join us for…
Azure is excited to announce that hibernation support has been extended to the following General Purpose VM sizes up to 64GB RAM-: Esv5-series Edsv5-series Easv5-series Eadsv5-series In addition, as previously announced in…
It was great to see you at the NVIDIA GTC AI Conference in San Jose, CA March 18 - 21! We hope the Microsoft Azure AI infrastructure demos and sessions…
What a difference a year makes! Last year I said 2022 was a banner year for AI developments… if 2022 was a banner year then how should we describe 2023?…
What a difference a year makes! Last year I said 2022 was a banner year for AI developments… if 2022 was a banner year then how should we describe 2023?…
By: Mark Gitau, Software Engineer, and Hugo Affaticati, Technical Program Manager 2 Useful resources: New NC H100 v5-series: Microsoft NC H100 v5-series Thought leadership article: Aka.ms/Blog/MLPerfInfv4 Azure results for MLPerf Inference: MLPerf…
Azure NC H100 v5 virtual machines (VMs) are an excellent platform for executing diverse AI and High-Performance Computing (HPC) workloads. These workloads demand substantial computational power, large capacity of high-performance…
We're excited to announce that in April, Azure will be offering customers the ability to optimize GPU compute costs by enabling hibernation on Virtual Machines (VMs). With this feature,…
SLURM (Simple Linux Utility for Resource Management) is a highly configurable open-source workload manager used in high-performance computing (HPC) environments. Job accounting is a crucial aspect of SLURM, allowing system…
CycleCloud 8.6: What's New and How to Get Started Learn about the latest features and enhancements of CycleCloud, the leading cloud HPC orchestration platform. CycleCloud is a powerful tool that…
Co-Written with Erik Garcia, WEKA Director of Cloud Sales, Brian Markenson, WEKA System Engineer, & Adam Fowler, WEKA System Engineer High Performance Compute (HPC) grids in the Financial Services Industry are unique…
Dr. Wolfgang De Salvador - EMEA GBB HPC/AI Infrastructure Senior Specialist Dr. Kai Neuffer - Principal Program Manager, Industry and Partner Sales - Energy Industry Resources and references used in…
Would you like to have a single script to quickly provision High Performance Computing (HPC) clusters with access to several ready-to-use HPC applications (WRF, GROMACS, OpenFOAM, and many more) so…
Authored by: Aimee Garcia, PM AI Benchmarking, Rick Shahid, Azure HPC + AI Media and Entertainment Solution Architect, Isayah Reed, Senior Software Engineer, Jon Weisner, Director, Global Black Belt –…
OpenFOAM, which stands for Open Field Operation and Manipulation, is a free, open-source software framework primarily used for computational fluid dynamics (CFD) simulations. It provides a wide range of numerical…
Welcome to the new era where AI is driving innovation and rapidly changing what applications look like, how they’re designed and built, and how they’re delivered. Nearly every industry…
OVERVIEW Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure. With CycleCloud, users can provision infrastructure for HPC systems, deploy familiar HPC…
Azure Machine Learning is a service that enables you to create, train, deploy, and manage machine learning models and experiments. You can also use it to run HPC applications, such…
This marks my final blog post of 2023, serving as a fundamental demonstration highlighting the distinctions in speed between CPU and GPU in AI training. The primary aim of this…
Last month we first announced ND MI300X v5 virtual machines (VMs) and talked about how they were the culmination of a long-term partnership between Microsoft and AMD. We also talked about…
Introduction This blog post walks through how to setup an Azure Managed Lustre Filesystem (AMLFS) that will automatically synchronise to an Azure BLOB Storage container. The synchronisation is achieved using the Lustre…
Generative AI promises to revolutionize how we live, work and play. This, as with prior technology innovations, have all been developed on increasingly complex, high-performance semiconductor platforms and systems. As…
Today at Ignite, Microsoft is announcing the public preview of the NC H100 v5 Virtual Machine Series, the latest addition to our portfolio of purpose-built infrastructure for High Performance Computing (HPC) and Artificial Intelligence…
In our relentless pursuit of pushing the boundaries of artificial intelligence, we understand that cutting-edge infrastructure and expertise is needed to harness the full potential of advanced AI. At Microsoft,…
Overview Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure. With CycleCloud, users can provision infrastructure for HPC systems, deploy familiar…
The Azure family of visualization GPU VMs has gained a new member. Many customers have used the VMs in our existing portfolio for a variety of visualization workloads as well…
Introduction Containers technologies are no longer something new in the industry. It all started focusing on how to deploy reproducible development environments but now you can find many other fields…
In the cloud computing world, infrastructure-as-code (IaC) refers to the practice of managing and provisioning infrastructure through code. As someone with a development background, this is my preferred alternative to manually provisioning…
Come visit Microsoft at Supercomputing 2023 (SC23) November 12 – 17 where we’ll deep dive into high-performance computing (HPC) and AI solutions during an exciting week of sessions, hands on experiences and…
Understanding Azure Batch for HPC Azure Batch is a cloud-based service provided by Microsoft Azure that's designed to handle HPC workloads efficiently. It's known for its simplicity and versatility,…
Profiling AI/ML models is a pivotal step in harnessing the full potential of computational resources, especially when deploying on high-performance platforms like single/multi-GPUs. This process delves deep into the model's…
Azure CycleCloud allows the creation of resources to run High Performance Computing (HPC) applications based on widely used job schedulers such as PBS, SLURM, and LSF. Once CycleCloud is installed,…
There is a known behaviour in Lustre if a VM has the Lustre mounted and it gets evicted or deleted as part of workflow without releasing the filesystem lock. Lustre…
Azure Managed Lustre delivers the time-tested Lustre file system as a first-party managed service on Azure. Long-time users of Lustre on-premises can now leverage the benefits of a complete HPC solution,…
There are a lot of different products you need to successfully complete a high-performance computing (HPC) workload. You’ll hear several terms regularly, like virtual machines, CPUs, GPUs, compute power, and…
Introduction In the realm of high-performance computing (HPC) and AI workloads, the need for agile and powerful storage solutions cannot be overstated. Azure Managed Lustre (AMLFS) has emerged as a…
Image courtesy of Altair The world of high-performance computing (HPC) and simulation just got a major boost as Altair® Unlimited™, the state-of-the-art virtual private cloud appliance, has become available on…
Lustre has long been the gold standard for extreme performance and scalability amongst parallel file systems used in HPC. The open-source community has continually improved Lustre features and performance to…
Overview: This blog discusses how easily we can integrate Azure Managed Lustre Filesystem into CycleCloud HPC cluster using a custom project named cyclecloud-amlfs. Azure Managed Lustre delivers the time-tested Lustre file…
Azure Managed Lustre Lustre is an open-source parallel filesystem born for high performance computing as a research project back in 1999. Its name is the fusion of Linux and cluster,…
By Hugo Affaticati (Technical Program Manager), Sonal Doomra (Technical Program Manager 2), and Jon Shelley (Principal TPM Manager). Introduction Azure is pleased to showcase results from our MLPerf Training…
One is deploying an HPC embarrassingly parallel application in Azure Virtual Machine Scale Sets (VMSSs) and realized that (i) ssh into VM instances is possible even when they have not…
Introduction In a previous blog post we showed how to deploy an optimal NDm_v4 AKS cluster, i.e. all 8 InfiniBand and GPU devices on each NDm_v4 are installed…
Article contributed by Amirreza Rastegari, Jon Shelley, Scott Moe, Jie Zhang, Jithin Jose, Anshul Jain, Jyothi Venkatesh, Joe Greenseid, Fanny Ou, and Evan Burness Azure has announced the general availability…
1. Introduction: We are excited to share with you the latest benchmarks for Barracuda Virtual Reactor on the Azure NDv2 ('Standard_ND40rs_v2') and ND A100 v4 ('Standard_ND96asr_v4') virtual machines (VMs), accelerated by NVIDIA…