Data communication for AI servers involves three components: internal server communication, communication between servers within an AI cluster, and wide-area communication across clusters.
High-speed communication between GPUs within a server primarily uses NVLink. Of course, NVIDIA also utilizes NVLink to build SuperPOD clusters, but its support for GPU scale is relatively limited, making it suitable mainly for small-scale data transfers between server nodes. Large-scale AI clusters primarily rely on RDMA networks, specifically RoCE or InfiniBand.
This article uses a typical NVIDIA A100 server as an example to detail the interconnect architecture between its various components. The internal network configuration of the A100 server is shown in the figure below:

The main modules of the A100 server include: 2 CPUs, 2 InfiniBand storage network interface cards (BF3 DPUs), 4 PCIe Gen4 switch chips, 6 NVSwitch chips, 8 GPUs (A100), and 8 InfiniBand network interface cards. The 8 GPUs are connected in a full-mesh configuration via the 6 NVSwitch chips.
1. Between GPUs within the host, NVLink is used: The A100’s bidirectional bandwidth is 12 × 50 GB/s = 600 GB/s;The A800 is a stripped-down version, with bidirectional bandwidth reduced to 8 × 50 GB/s = 400 GB/s 2. Between GPUs and NICs within the host: GPU <--> PCIe Switch <--> NIC, with a theoretical unidirectional bandwidth of 32 GB/s 3. Between GPUs across hosts:
Data is transmitted via InfiniBand NICs. As shown in the figure below:

Whether it is the compute network or the storage network, RDMA is required to meet the high-performance demands of AI. The network adopts a Spine-Leaf architecture: 8 GPUs are directly connected to Leaf switches via InfiniBand NICs (HDR, 200 Gbps), and the Leaf switches are connected to Spine switches via a full-mesh topology, forming a cross-host GPU compute network.
The reason the A100 uses HDR InfiniBand network cards is that HDR’s 200 Gbps (i.e., 25 GB/s) unidirectional bandwidth is already close to the theoretical speed of PCIe Gen 4’s 32 GB/s unidirectional bandwidth. Even high-end NDR (400 Gbps unidirectional, i.e., 50 GB/s) would not offer much additional benefit.
Conclusion:
As a native RDMA network, InfiniBand excels in congestion-free and low-latency environments. However, its architecture is relatively closed and costly (at equivalent bandwidth, InfiniBand outperforms RoCE by over 20% but costs twice as much). Therefore, InfiniBand is primarily suitable for small-to-medium-scale cluster scenarios.
RoCE, on the other hand, leverages its mature Ethernet ecosystem, low networking costs, and rapid technological iteration, making it more suitable for medium-to-large-scale training clusters. For example, the 8-GPU servers currently sold by public cloud providers almost exclusively use RoCE networks.