NVIDIA Unveils the Vera Rubin NVL72: Full Specs and Platform Breakdown
Seven chips, five racks, 60 exaflops. Everything announced in NVIDIA's Vera Rubin platform at GTC 2026.
NVIDIA announced the Vera Rubin platform at GTC 2026. Vera Rubin is a seven-chip, five-rack AI supercomputer architecture designed for agentic AI workloads. The platform is in full production as of Q1 2026 with partner availability in H2 2026.
TLDR
- Vera Rubin NVL72 packs 72 Rubin GPUs and 36 Vera CPUs into a single liquid-cooled rack delivering 3.6 EFLOPS of NVFP4 inference and 2.5 EFLOPS of training compute.
- Seven co-designed chips span compute, networking, storage, and security—including the Groq 3 LPU from NVIDIA's $20B acqui-hire of Groq.
- NVIDIA claims 5x inference performance and 10x lower cost per token versus Blackwell at the rack level.
- A full Vera Rubin POD scales to 40 racks, 1,152 GPUs, and 60 exaflops.
- First deployments expected from AWS, Google Cloud, Microsoft, OCI, and CoreWeave in H2 2026.
Platform Overview
The Vera Rubin platform comprises seven chips across five rack-scale systems. Each chip serves a dedicated function across compute, networking, and data management.
The Seven Chips
Below is a detailed breakdown of the specifications for each of the seven co-designed chips that form the Vera Rubin platform.
Rubin GPU
- Process: TSMC 3nm, dual-die design
- Transistors: 336 billion
- Memory: 288 GB HBM4, 22 TB/s bandwidth
- Inference Performance: 50 PFLOPS NVFP4 (5x Blackwell)
- Training Performance: 35 PFLOPS NVFP4 (3.5x Blackwell)
- Transformer Engine: 3rd generation with hardware-accelerated adaptive compression
- Power Profiles: Max Q (~1.8 kW TGP) and Max P (~2.3 kW TGP)
Vera CPU
- Cores: 88 custom Arm-based Olympus cores (Armv9.2), 176 threads via Spatial Multithreading
- Memory: Up to 1.5 TB SOCAMM LPDDR5X, 1.2 TB/s bandwidth
- NVLink-C2C: 1.8 TB/s coherent bandwidth (7x PCIe Gen 6)
- Role: Orchestration, workload scheduling, KV cache routing, agentic workflow control plane
- NVIDIA's first standalone data center CPU. Jensen Huang called it "already for sure going to be a multi-billion dollar business."
Groq 3 LPU (LP30)
- Origin: NVIDIA's $20B acqui-hire of Groq (December 2025); replaces the previously announced Rubin CPX
- Generation: 3rd generation (manufactured by Samsung)
- On-Chip Memory: ~500 MB stacked SRAM per chip, ~80 TB/s bandwidth per chip
- Function: Decode-phase inference acceleration—Rubin GPUs handle prefill (processing input contexts), Groq 3 LPUs handle decode (token generation at low latency)
- When paired with Rubin NVL72, NVIDIA claims 35x higher inference throughput per megawatt for trillion-parameter models.
NVLink 6 Switch
- Bandwidth: 3.6 TB/s bidirectional per GPU (2x Blackwell)
- Per Tray: 28.8 TB/s switching bandwidth, 14.4 TFLOPS FP8 in-network compute
- Per NVL72 Rack: 9 switch trays, 260 TB/s aggregate scale-up bandwidth
- In-network compute accelerates all-to-all collective operations for MoE routing.
ConnectX-9 SuperNIC
- Throughput: 1.6 Tb/s per GPU, 8 NICs per compute tray
- Function: Scale-out connectivity (rack-to-rack) via Spectrum-X Ethernet or Quantum-X800 InfiniBand
BlueField-4 DPU
- Architecture: Dual-die, a 64-core Grace CPU + integrated ConnectX-9 NIC
- vs BF-3: 2x bandwidth, 3x memory bandwidth, 6x compute
- Function: Offloads networking, storage, encryption, and security enforcement
- Powers the new Inference Context Memory Storage Platform (CMX), extending GPU memory into NVMe storage for KV cache data.
Spectrum-6 Ethernet Switch
- Aggregate Bandwidth: 102.4 Tb/s
- Co-Packaged Optics: First NVIDIA switch with CPO, co-developed with TSMC ("COUPE")
- Efficiency: 5x power efficiency, 10x resiliency vs prior Spectrum-X generations
The Five Rack Systems
Below is a detailed breakdown of the specifications for each of the five rack systems that form the Vera Rubin platform.
Vera Rubin NVL72
- GPUs: 72 Rubin GPUs + 36 Vera CPUs
- Inference: 3.6 EFLOPS NVFP4
- Training: 2.5 EFLOPS NVFP4
- HBM4: 20.7 TB capacity, 1.6 PB/s bandwidth
- LPDDR5X: 54 TB
- Scale-Up Bandwidth: 260 TB/s (NVLink 6)
- Cooling: 100% liquid, 45°C inlet water (enables free cooling without chillers in most climates)
- Assembly: Full rack in ~2 hours (from ~2 days); compute tray swap in ~5 minutes
- Rack: ~4,000 lbs, ~1,300 chips, 1.3M components
- Power: ~190 kW (Max Q) / ~230 kW (Max P)
Groq 3 LPX Rack
- LPUs: 256
- On-Chip SRAM: 128 GB aggregate
- Memory Bandwidth: 40 PB/s
- Scale-Up Bandwidth: 640 TB/s per rack
- Designed to pair with NVL72 for disaggregated prefill/decode inference.
Vera CPU Rack
- CPUs: 256 Vera CPUs
- Designed for large-scale reinforcement learning and agentic orchestration/sandboxing.
BlueField-4 STX Rack
- AI-native storage with CMX for KV cache extension into NVMe
Spectrum-6 SPX Rack
- Silicon photonics-based networking for low-latency, resilient multi-rack connectivity
A full Vera Rubin POD scales to 40 racks, 1,152 Rubin GPUs, ~20,000 dies, and 60 exaflops.
Vera Rubin vs Blackwell
Below is a direct comparison of key performance metrics between the Vera Rubin NVL72 and the previous-generation Blackwell NVL72.
When paired with the Groq 3 LPX rack, NVIDIA claims 35x throughput per megawatt for inference, targeting what Jensen Huang framed as "premium" and "ultra" token pricing tiers for high-interactivity agentic AI.
Production Status and Availability
Vera Rubin is in full production. Rubin-based products ship H2 2026.
Cloud providers offering Vera Rubin instances: AWS, Google Cloud, Microsoft (Fairwater AI superfactories), OCI, CoreWeave, Lambda, Nebius, Nscale
Server OEMs building Vera Rubin systems: Cisco, Dell, HPE, Lenovo, Supermicro
AI labs adopting the platform: Anthropic, Cohere, Meta, Mistral AI, OpenAI, Perplexity, Runway, xAI, and others
The HGX Rubin NVL8 (8-GPU server, Intel Xeon 6 host) is also available for enterprise deployments.
Looking ahead: Rubin Ultra with the new Kyber rack (144 GPUs, 600 kW, 15 EFLOPS FP4) expected H2 2027.
Market Context
AMD's Helios rack with MI400-series GPUs also targets H2 2026, claiming 2.9 EFLOPS FP4 and 31 TB HBM4—50% more HBM4 capacity than Vera Rubin NVL72. Hyperscaler custom silicon (Google TPU, AWS Trainium, Meta MTIA) continues advancing on inference.
NVIDIA also announced the Space-1 Vera Rubin Module for orbital data centers, delivering 25x the AI compute of H100 for space-based inference. Six partners (including Axiom Space, Starcloud, and Planet Labs) are deploying on the platform, though no ship date has been set.
What Vera Rubin Means for AI/HPC Infrastructure Buyers
For Bitcoin miners building out or evaluating AI/HPC infrastructure, each new GPU generation resets the compute economics. Vera Rubin's performance-per-watt improvements and rack-scale deployment model are directly relevant to operators assessing diversification strategies. Luxor's Hardware desk sources enterprise GPUs for AI/HPC deployments. Reach out at [email protected] for pricing and availability.
Hashrate Index Newsletter
Join the newsletter to receive the latest updates in your inbox.