Explanation of 512 H100 clustering solution

A total of 256 servers represents the limit of a two-tier Spine-Leaf architecture; exceeding 256 servers requires a three-tier architecture, specifically a Core-Spine-Leaf architecture. Therefore, for the 512-server H100 network configuration we are introducing today, we have designed it as a three-tier IB network.In the three-tier architecture, the Core layer serves as the core switching layer, responsible for high-speed data forwarding and aggregation; the Spine layer acts as the backbone network layer, providing high-speed connectivity and forwarding capabilities; and the Leaf layer functions as the access layer, responsible for connecting servers to the network.

Given the exceptionally high data transmission requirements for large-scale model training, the compute network is designed with a global no-blocking architecture and utilizes a 400 Gb/s IB network (NDR), while the storage network employs a 200 Gb/s IB network (HDR). The complete network architecture diagram for the 512-node cluster is shown below.

I. Computing Network

The 512 H100 servers are divided into 4 SuperPods, with each SuperPod containing 4 SU units, and each SU unit containing 32 H100 servers. This means each SuperPod has 128 servers.

Every 4 Leaf switches and 4 Spine switches form a Rail Group. Each SuperPod corresponds to 8 Rail Groups, comprising 32 Leaf switches and 32 Spine switches, requiring 16 Core switches for the Core layer.Thus, each SuperPod requires 32 + 32 + 16 = 80 IB switches, and 4 SuperPods require 80 × 4 = 320 IB switches.

Each H100 server is configured with 8 400G network interface cards (NICs) and uses a multi-rail networking architecture, meaning each server’s 8 400G NICs are connected to 8 different Leaf switches.

The three-tier Core-Spine-Leaf network topology diagram is as follows:

II. Storage Network

The storage system for the 512-node cluster is divided into two parts: high-performance storage and mass storage. High-performance storage uses all-flash drives, configured at a ratio of 1 TB per GPU; thus, 512 H100 servers typically require 4 PB of high-performance storage. Mass storage is configured at 4–5 times the capacity of high-performance storage, with a planned usable capacity of 20 PB.

The storage network utilizes a 200Gb/s InfiniBand (HDR) network;Each H100 server is equipped with one 200Gb IB network card as the storage access port. The overall network adopts a two-tier Spine-Leaf architecture configured with a global 1:1 convergence ratio, requiring 37 Leaf switches and 20 Spine switches. QM8700-class IB switches are sufficient for this configuration.

More in AI Academy

How to choose A100, A800, H100, H800 Arithmetic GPU cards for large model training [Ape World Arithmetic AI Academy

NVIDIA B300 Technology In-Depth Analysis: Architectural Innovation and Enterprise AI Arithmetic Enabling Value

RTX 5090 Technology Analysis and Enterprise Application Enablement: The Value of Arithmetic Innovation in Four Core Areas

Arithmetic Leasing Selection Alert: A Guide to Avoiding the Three Core Pitfalls | 猿界算力

Low Latency-High Throughput: How Bare Metal GPUs Reconfigure the HPC and AI Convergence Arithmetic Base