"InfiniBand: The Leader in High-Performance Networking Technology, Empowering the Future of HPC and AI"

An In-Depth Look at InfiniBand

What Is InfiniBand?

InfiniBand is a high-performance networking technology specifically designed for RDMA (Remote Direct Memory Access). Through hardware-level optimizations, it enables high-speed, low-latency data transfer between servers, significantly reducing the CPU load and enhancing overall system performance.InfiniBand technology is primarily used in high-performance computing (HPC) and artificial intelligence (AI) clusters. Particularly during distributed training within AI clusters, InfiniBand enables high-speed data transfer between servers, making it one of the key technologies for building high-performance computing clusters.

The Evolution of InfiniBand

Background: In the 1990s, as computer hardware rapidly evolved, the PCI bus gradually became a system bottleneck. To address this issue, the IBTA (InfiniBand Trade Association) was established to research new alternative technologies, leading to the emergence of InfiniBand.
Development History:

In 2000, version 1.0 of the InfiniBand architecture specification was officially released.
In 2003, InfiniBand began to shift its focus toward computer cluster interconnects.
In 2005, InfiniBand was also applied to connecting storage devices.
After 2012, driven by growing HPC demands, InfiniBand technology continued to evolve, and its market share increased.
In 2015, InfiniBand accounted for over 50% of the TOP500 list, becoming the preferred cluster interconnect technology for supercomputers.

Major Vendors: Mellanox is a leading supplier in the InfiniBand market and was acquired by NVIDIA in 2019.

Upgrades in InfiniBand Network Bandwidth

InfiniBand’s network bandwidth has undergone multiple upgrades, progressing from SDR (Single Data Rate), DDR (Double Data Rate), QDR (Quad Data Rate), FDR (Five Data Rate), and EDR (Enhanced Data Rate) to HDR (Hexa Data Rate) and NDR (Next Data Rate). Each upgrade has delivered a significant increase in network bandwidth, thereby supporting faster data transmission.

InfiniBand Networking

To enable lossless communication between any two compute nodes, InfiniBand networks typically employ a fat-tree network architecture. This architecture achieves efficient data forwarding and access through a combination of core and edge switches. For example, a typical InfiniBand network topology diagram featuring 32 H100 servers demonstrates how to build the network using core and edge IB switches.

InfiniBand Commercial Products

Mellanox (now part of NVIDIA) is a leading supplier in the global InfiniBand market. Its seventh-generation NVIDIA InfiniBand architecture includes the NVIDIA Quantum-2 series of switches, NVIDIA ConnectX-7 InfiniBand adapters, BlueField-3 InfiniBand DPUs, and dedicated InfiniBand cables.These products offer bidirectional throughput of up to 51.2 Tb/s and support NDR 400 Gb/s InfiniBand ports, making them an ideal choice for building high-performance computing clusters.

Switches: Models such as the QM9700 and QM9790 provide 64 NDR 400 Gb/s InfiniBand ports or 128 200 Gb/s ports.
Adapters: Such as the ConnectX-7, used to connect servers to InfiniBand networks.
Cables: These include DAC high-speed copper cables (short transmission distances, relatively inexpensive) and AOC active optical cables (long transmission distances, higher cost).

In summary, as a high-performance networking technology, InfiniBand plays a vital role in the fields of high-performance computing and artificial intelligence. Through continuous technological advancements and the launch of commercial products, InfiniBand will continue to support the development of efficient and stable computing networks.

An In-Depth Look at InfiniBand

What Is InfiniBand?

The Evolution of InfiniBand

Upgrades in InfiniBand Network Bandwidth

InfiniBand Networking

InfiniBand Commercial Products

More in AI Academy

How to choose A100, A800, H100, H800 Arithmetic GPU cards for large model training [Ape World Arithmetic AI Academy

NVIDIA B300 Technology In-Depth Analysis: Architectural Innovation and Enterprise AI Arithmetic Enabling Value

RTX 5090 Technology Analysis and Enterprise Application Enablement: The Value of Arithmetic Innovation in Four Core Areas

Arithmetic Leasing Selection Alert: A Guide to Avoiding the Three Core Pitfalls | 猿界算力

Low Latency-High Throughput: How Bare Metal GPUs Reconfigure the HPC and AI Convergence Arithmetic Base