Amazon’s New EC2 Trn2 Instances and UltraServers: Raising the Bar for AI/ML Performance

December 12, 2024
3
min read

Introduction

AWS recently rolled out some major updates at re:Invent 2024, unveiling their new Amazon EC2 Trn2 instances and Trn2 UltraServers—both powered by the latest generation of AWS Trainium2 chips. For those of us watching the AI/ML infrastructure space, this announcement could mark a significant shift in how we train and deploy large-scale models. In this post, we’ll unpack the key points from AWS’s news, highlight the technical breakthroughs, and share our perspective on what this means for enterprise-grade machine learning.

Source: Amazon EC2 Trn2 instances and UltraServers

Why This Announcement Matters

The machine learning ecosystem is evolving faster than ever, with models growing more complex and resource-intensive. Traditional scalability—either “scaling up” a single machine or “scaling out” across many nodes—hasn’t always kept pace with the demands of foundation models, massive language models (LLMs), and real-time inference requirements. The new Trn2 instances and UltraServers seem tailor-made to address these bottlenecks, offering a mix of raw compute power, improved memory bandwidth, and better price performance.

From our vantage point, this feels like AWS putting a stake in the ground: they’re not just iterating on existing GPU solutions; they’re offering an alternative path that may appeal to organizations feeling the constraints of GPU-based training, both technically and financially.

Inside the Trn2 Instances

AWS Trainium2 chip

At the heart of each Trn2 instance is the AWS Trainium2 chip. Each chip houses eight NeuronCores and packs 96 GiB of high-bandwidth memory. The stats are impressive: up to 1.3 petaflops of dense FP8 compute and up to 5.2 petaflops of sparse FP8 compute per chip. Multiply this by the 16 chips in a single Trn2 instance, and you’re looking at a system that offers a substantial bump in performance compared to the first-generation Trn1.

Source: Amazon EC2 Trn2 Instances and Trn2 UltraServers for AI/ML training

Also worth noting is the 3.2 Tbps Elastic Fabric Adapter (EFA) v3 bandwidth, which AWS claims reduces latency by 35% over the previous generation. If these numbers hold true in real-world scenarios, it could streamline the training of large, distributed models—an increasingly common need in enterprises running multi-terabyte training sets.

Cost Performance: A Not-So-Subtle Nudge Away from GPUs

AWS states that Trn2 instances deliver 30–40% better price performance than their GPU-based P5e and P5en instances. If you’re currently locked into a GPU-based stack, this might give you pause. While switching to a different type of accelerator (in this case, Trainium2) isn’t trivial, the cost savings and performance gains may justify the migration effort—particularly for companies running persistent large-scale training jobs.

Introducing the Trn2 UltraServers: Going Beyond Single Instances

For organizations working at the bleeding edge—think trillion-parameter models—AWS unveiled the Trn2 UltraServers. These are essentially four Trn2 instances stitched together using a high-bandwidth, low-latency NeuronLink interconnect. The result? A powerhouse with:

  • 64 Trainium2 chips
  • 512 NeuronCores
  • 6 TiB of HBM
  • Up to 83 petaflops of dense FP8 compute (332 petaflops sparse FP8)

The UltraServers aim to eliminate bottlenecks at scale, making it easier to train massive models and run lightning-fast inference. For real-time services, this could mean quicker response times, and for R&D teams tackling frontier models, a reduced training loop time. It’s a bold attempt to solve the fundamental scaling challenges that come with the territory of cutting-edge AI.

Real-World Adoption and What It Means

According to AWS, tens of thousands of Trainium chips are already powering Amazon’s internal workloads and AWS services—like the Rufus shopping assistant that was deployed during Prime Day. Additionally, Trainium2 chips are fueling latency-optimized versions of large models such as Llama 3.1 405B and Claude 3.5 on Amazon Bedrock.

From a user’s standpoint, seeing these chips in action in production scenarios should help build confidence. It’s one thing to tout performance in a keynote; it’s another to say, “We use this ourselves for massive, business-critical workloads.”

Getting Your Hands on Trn2 Instances

Trn2 instances are now generally available in the US East (Ohio) Region. AWS is offering a reservation system—EC2 Capacity Blocks for ML—where you can book up to 64 instances for up to six months. For users who need steady, predictable access to these high-end resources, this reservation model could smooth out capacity planning headaches.

On the software side, the AWS Deep Learning AMIs come preconfigured with frameworks like PyTorch and JAX. If you’ve been using the AWS Neuron SDK (a toolchain for optimizing ML frameworks for Trainium), migrating your applications to Trn2 should be relatively painless. This aligns with AWS’s broader strategy of making these new instances as accessible as possible to existing ML workflows.

Conclusion

AWS’s introduction of EC2 Trn2 instances and Trn2 UltraServers brings new performance and memory capabilities to the AI/ML infrastructure landscape. With potentially appealing price points, these offerings may influence how businesses approach large-scale training and inference.

For organizations on the cutting edge, the biggest question may not be “Is this faster?” but rather “What new possibilities does this unlock?” We’ll be following closely to find out.

Share this article:
Subscribe to our newsletter to get our latest updates!
Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Related articles