NVIDIA Unveils the Vera Rubin NVL72: Full Specs and Platform Breakdown

Seven chips, five racks, 60 exaflops. Everything announced in NVIDIA's Vera Rubin platform at GTC 2026.

Ian Philpot

20 Mar 2026

NVIDIA announced the Vera Rubin platform at GTC 2026. Vera Rubin is a seven-chip, five-rack AI supercomputer architecture designed for agentic AI workloads. The platform is in full production as of Q1 2026 with partner availability in H2 2026.

TLDR

Vera Rubin NVL72 packs 72 Rubin GPUs and 36 Vera CPUs into a single liquid-cooled rack delivering 3.6 EFLOPS of NVFP4 inference and 2.5 EFLOPS of training compute.
Seven co-designed chips span compute, networking, storage, and security—including the Groq 3 LPU from NVIDIA's $20B acqui-hire of Groq.
NVIDIA claims 5x inference performance and 10x lower cost per token versus Blackwell at the rack level.
A full Vera Rubin POD scales to 40 racks, 1,152 GPUs, and 60 exaflops.
First deployments expected from AWS, Google Cloud, Microsoft, OCI, and CoreWeave in H2 2026.

Platform Overview

The Vera Rubin platform comprises seven chips across five rack-scale systems. Each chip serves a dedicated function across compute, networking, and data management.

Component	Type	Key Specs
Rubin GPU	AI Accelerator	288 GB HBM4, 22 TB/s bandwidth, 50 PFLOPS NVFP4, 336B transistors
Vera CPU	Data Center CPU	88 Olympus cores, 1.5 TB LPDDR5X, 1.2 TB/s memory bandwidth
Groq 3 LPU	Inference Accelerator	~500 MB SRAM/chip, ~80 TB/s bandwidth/chip
NVLink 6 Switch	Scale-Up Interconnect	3.6 TB/s per GPU, 260 TB/s aggregate per NVL72 rack
ConnectX-9 SuperNIC	Scale-Out Networking	1.6 Tb/s per GPU
BlueField-4 DPU	Data Processing Unit	64-core Grace CPU + ConnectX-9 NIC, 2x bandwidth vs BF-3
Spectrum-6 Switch	Ethernet Switch	102.4 Tb/s aggregate, co-packaged optics

The Seven Chips

Below is a detailed breakdown of the specifications for each of the seven co-designed chips that form the Vera Rubin platform.

Rubin GPU

Process: TSMC 3nm, dual-die design
Transistors: 336 billion
Memory: 288 GB HBM4, 22 TB/s bandwidth
Inference Performance: 50 PFLOPS NVFP4 (5x Blackwell)
Training Performance: 35 PFLOPS NVFP4 (3.5x Blackwell)
Transformer Engine: 3rd generation with hardware-accelerated adaptive compression
Power Profiles: Max Q (~1.8 kW TGP) and Max P (~2.3 kW TGP)

Vera CPU

Cores: 88 custom Arm-based Olympus cores (Armv9.2), 176 threads via Spatial Multithreading
Memory: Up to 1.5 TB SOCAMM LPDDR5X, 1.2 TB/s bandwidth
NVLink-C2C: 1.8 TB/s coherent bandwidth (7x PCIe Gen 6)
Role: Orchestration, workload scheduling, KV cache routing, agentic workflow control plane
NVIDIA's first standalone data center CPU. Jensen Huang called it "already for sure going to be a multi-billion dollar business."

Groq 3 LPU (LP30)

Origin: NVIDIA's $20B acqui-hire of Groq (December 2025); replaces the previously announced Rubin CPX
Generation: 3rd generation (manufactured by Samsung)
On-Chip Memory: ~500 MB stacked SRAM per chip, ~80 TB/s bandwidth per chip
Function: Decode-phase inference acceleration—Rubin GPUs handle prefill (processing input contexts), Groq 3 LPUs handle decode (token generation at low latency)
When paired with Rubin NVL72, NVIDIA claims 35x higher inference throughput per megawatt for trillion-parameter models.

NVLink 6 Switch

Bandwidth: 3.6 TB/s bidirectional per GPU (2x Blackwell)
Per Tray: 28.8 TB/s switching bandwidth, 14.4 TFLOPS FP8 in-network compute
Per NVL72 Rack: 9 switch trays, 260 TB/s aggregate scale-up bandwidth
In-network compute accelerates all-to-all collective operations for MoE routing.

ConnectX-9 SuperNIC

Throughput: 1.6 Tb/s per GPU, 8 NICs per compute tray
Function: Scale-out connectivity (rack-to-rack) via Spectrum-X Ethernet or Quantum-X800 InfiniBand

BlueField-4 DPU

Architecture: Dual-die, a 64-core Grace CPU + integrated ConnectX-9 NIC
vs BF-3: 2x bandwidth, 3x memory bandwidth, 6x compute
Function: Offloads networking, storage, encryption, and security enforcement
Powers the new Inference Context Memory Storage Platform (CMX), extending GPU memory into NVMe storage for KV cache data.

Spectrum-6 Ethernet Switch

Aggregate Bandwidth: 102.4 Tb/s
Co-Packaged Optics: First NVIDIA switch with CPO, co-developed with TSMC ("COUPE")
Efficiency: 5x power efficiency, 10x resiliency vs prior Spectrum-X generations

The Five Rack Systems

Below is a detailed breakdown of the specifications for each of the five rack systems that form the Vera Rubin platform.

Vera Rubin NVL72

GPUs: 72 Rubin GPUs + 36 Vera CPUs
Inference: 3.6 EFLOPS NVFP4
Training: 2.5 EFLOPS NVFP4
HBM4: 20.7 TB capacity, 1.6 PB/s bandwidth
LPDDR5X: 54 TB
Scale-Up Bandwidth: 260 TB/s (NVLink 6)
Cooling: 100% liquid, 45°C inlet water (enables free cooling without chillers in most climates)
Assembly: Full rack in ~2 hours (from ~2 days); compute tray swap in ~5 minutes
Rack: ~4,000 lbs, ~1,300 chips, 1.3M components
Power: ~190 kW (Max Q) / ~230 kW (Max P)

Groq 3 LPX Rack

LPUs: 256
On-Chip SRAM: 128 GB aggregate
Memory Bandwidth: 40 PB/s
Scale-Up Bandwidth: 640 TB/s per rack
Designed to pair with NVL72 for disaggregated prefill/decode inference.

Vera CPU Rack

CPUs: 256 Vera CPUs
Designed for large-scale reinforcement learning and agentic orchestration/sandboxing.

BlueField-4 STX Rack

AI-native storage with CMX for KV cache extension into NVMe

Spectrum-6 SPX Rack

Silicon photonics-based networking for low-latency, resilient multi-rack connectivity

A full Vera Rubin POD scales to 40 racks, 1,152 Rubin GPUs, ~20,000 dies, and 60 exaflops.

Vera Rubin vs Blackwell

Below is a direct comparison of key performance metrics between the Vera Rubin NVL72 and the previous-generation Blackwell NVL72.

Metric	Blackwell NVL72	Vera Rubin NVL72	Delta
Inference (NVFP4, per GPU)	10 PFLOPS	50 PFLOPS	5x
Training (NVFP4, per GPU)	10 PFLOPS	35 PFLOPS	3.5x
NVLink bandwidth per GPU	1.8 TB/s	3.6 TB/s	2x
GPUs to train MoE models	Baseline	1/4 the count	4x fewer
Cost per token (inference)	Baseline	1/10	10x lower

When paired with the Groq 3 LPX rack, NVIDIA claims 35x throughput per megawatt for inference, targeting what Jensen Huang framed as "premium" and "ultra" token pricing tiers for high-interactivity agentic AI.

Production Status and Availability

Vera Rubin is in full production. Rubin-based products ship H2 2026.

Cloud providers offering Vera Rubin instances: AWS, Google Cloud, Microsoft (Fairwater AI superfactories), OCI, CoreWeave, Lambda, Nebius, Nscale

Server OEMs building Vera Rubin systems: Cisco, Dell, HPE, Lenovo, Supermicro

AI labs adopting the platform: Anthropic, Cohere, Meta, Mistral AI, OpenAI, Perplexity, Runway, xAI, and others

The HGX Rubin NVL8 (8-GPU server, Intel Xeon 6 host) is also available for enterprise deployments.

Looking ahead: Rubin Ultra with the new Kyber rack (144 GPUs, 600 kW, 15 EFLOPS FP4) expected H2 2027.

Market Context

AMD's Helios rack with MI400-series GPUs also targets H2 2026, claiming 2.9 EFLOPS FP4 and 31 TB HBM4—50% more HBM4 capacity than Vera Rubin NVL72. Hyperscaler custom silicon (Google TPU, AWS Trainium, Meta MTIA) continues advancing on inference.

NVIDIA also announced the Space-1 Vera Rubin Module for orbital data centers, delivering 25x the AI compute of H100 for space-based inference. Six partners (including Axiom Space, Starcloud, and Planet Labs) are deploying on the platform, though no ship date has been set.

What Vera Rubin Means for AI/HPC Infrastructure Buyers

For Bitcoin miners building out or evaluating AI/HPC infrastructure, each new GPU generation resets the compute economics. Vera Rubin's performance-per-watt improvements and rack-scale deployment model are directly relevant to operators assessing diversification strategies. Luxor's Hardware desk sources enterprise GPUs for AI/HPC deployments. Reach out at [email protected] for pricing and availability.

AI/HPC Compute Hardware (GPU)

Ian Philpot

Marketing Director at Luxor Technology

NVIDIA Unveils the Vera Rubin NVL72: Full Specs and Platform Breakdown

TLDR

Platform Overview