Amazon Web Services (AWS) has introduced a new offering, Amazon Elastic Compute Cloud (EC2) Capacity Blocks for machine learning (ML), providing users the ability to reserve Nvidia GPUs for specified durations.
This service caters to tasks like training machine learning models and conducting experiments with existing models. In a blog post, Channy Yun, a principal developer advocate for AWS, highlighted the innovative nature of this feature, describing it as a way to reserve GPU instances for future use, tailored to the user's specific time requirements.
"With EC2 Capacity Blocks, you can reserve hundreds of GPUs collocated in EC2 UltraClusters designed for high-performance ML workloads, using Elastic Fabric Adapter (EFA) networking in a peta-bit scale non-blocking network, to deliver the best network performance available in Amazon EC2," Yun wrote in the blog post.
"This is an innovative new way to schedule GPU instances where you can reserve the number of instances you need for a future date for just the amount of time you require," he added.
Amazon: Growing Demand for GPU
Amazon emphasized the growing demand for GPU capacity in machine learning. It is a demand that has outpaced the industry's ability to supply it as more companies are now running large language models, which require access to GPUs.
The most popular of those so far are from Nvidia, which makes them expensive and often in short supply. It has led to GPUs becoming a limited resource, particularly for customers whose capacity needs vary depending on their current research and development phase.
Amazon's announcement of EC2 Capacity Blocks for ML aims to address this challenge. It introduces a usage model that simplifies access to GPU instances for training and deploying ML and generative AI models.
This offering allows users to reserve hundreds of GPUs located in EC2 UltraClusters, optimized for high-performance ML workloads. It is facilitated by Elastic Fabric Adapter (EFA) networking, ensuring top-tier network performance within Amazon EC2.
Like Reserving a Hotel Room
The renting process for EC2 Capacity Blocks was likened to reserving a hotel room. Users specify the date, duration, and size of their reservation, similar to choosing the type and size of a bed when booking a hotel room.
On the designated start date, users gain access to their reserved EC2 Capacity Block and can initiate their P5 instances. At the end of the reservation period, any running instances are automatically terminated.
This service caters to scenarios where users require capacity assurance for training or fine-tuning ML models, conducting experiments, or anticipating future spikes in demand for ML applications.
For other workload types that demand compute capacity assurance, such as critical business applications or those subject to regulatory requirements, On-Demand Capacity Reservations continue to be available.
To reserve an EC2 Capacity Block, users can navigate to the Capacity Reservations section on the Amazon EC2 console in the US East (Ohio) Region. Two capacity reservation options are presented, with users selecting "Purchase Capacity Blocks for ML" to initiate the process.