"InfiniBand: The Leader in High-Performance Networking Technology, Empowering the Future of HPC and AI"

Published December 10, 2024

An In-Depth Look at InfiniBandWhat Is InfiniBand?InfiniBand is a high-performance networking technology specifically designed for RDMA (Remote Direct Memory Access). Through hardware-level optimizations, it enables high-...

An In-Depth Look at InfiniBand

What Is InfiniBand?

InfiniBand is a high-performance networking technology specifically designed for RDMA (Remote Direct Memory Access). Through hardware-level optimizations, it enables high-speed, low-latency data transfer between servers, significantly reducing the CPU load and enhancing overall system performance.InfiniBand technology is primarily used in high-performance computing (HPC) and artificial intelligence (AI) clusters. Particularly during distributed training within AI clusters, InfiniBand enables high-speed data transfer between servers, making it one of the key technologies for building high-performance computing clusters.

image.png

The Evolution of InfiniBand

  • Background: In the 1990s, as computer hardware rapidly evolved, the PCI bus gradually became a system bottleneck. To address this issue, the IBTA (InfiniBand Trade Association) was established to research new alternative technologies, leading to the emergence of InfiniBand.

  • Development History:

    • In 2000, version 1.0 of the InfiniBand architecture specification was officially released.

    • In 2003, InfiniBand began to shift its focus toward computer cluster interconnects.

    • In 2005, InfiniBand was also applied to connecting storage devices.

    • After 2012, driven by growing HPC demands, InfiniBand technology continued to evolve, and its market share increased.

    • In 2015, InfiniBand accounted for over 50% of the TOP500 list, becoming the preferred cluster interconnect technology for supercomputers.

  • Major Vendors: Mellanox is a leading supplier in the InfiniBand market and was acquired by NVIDIA in 2019.

  • image.png

Upgrades in InfiniBand Network Bandwidth

InfiniBand’s network bandwidth has undergone multiple upgrades, progressing from SDR (Single Data Rate), DDR (Double Data Rate), QDR (Quad Data Rate), FDR (Five Data Rate), and EDR (Enhanced Data Rate) to HDR (Hexa Data Rate) and NDR (Next Data Rate). Each upgrade has delivered a significant increase in network bandwidth, thereby supporting faster data transmission.

InfiniBand Networking

To enable lossless communication between any two compute nodes, InfiniBand networks typically employ a fat-tree network architecture. This architecture achieves efficient data forwarding and access through a combination of core and edge switches. For example, a typical InfiniBand network topology diagram featuring 32 H100 servers demonstrates how to build the network using core and edge IB switches.

image.png

InfiniBand Commercial Products

Mellanox (now part of NVIDIA) is a leading supplier in the global InfiniBand market. Its seventh-generation NVIDIA InfiniBand architecture includes the NVIDIA Quantum-2 series of switches, NVIDIA ConnectX-7 InfiniBand adapters, BlueField-3 InfiniBand DPUs, and dedicated InfiniBand cables.These products offer bidirectional throughput of up to 51.2 Tb/s and support NDR 400 Gb/s InfiniBand ports, making them an ideal choice for building high-performance computing clusters.

  • Switches: Models such as the QM9700 and QM9790 provide 64 NDR 400 Gb/s InfiniBand ports or 128 200 Gb/s ports.

  • Adapters: Such as the ConnectX-7, used to connect servers to InfiniBand networks.

  • Cables: These include DAC high-speed copper cables (short transmission distances, relatively inexpensive) and AOC active optical cables (long transmission distances, higher cost).


  • image.png

In summary, as a high-performance networking technology, InfiniBand plays a vital role in the fields of high-performance computing and artificial intelligence. Through continuous technological advancements and the launch of commercial products, InfiniBand will continue to support the development of efficient and stable computing networks.


More in AI Academy

How to choose A100, A800, H100, H800 Arithmetic GPU cards for large model training [Ape World Arithmetic AI Academy

Choosing the right GPU depends on your specific needs and use cases. Below is a description of the features and recommended use cases for the A100, A800, H100, and H800 GPUs. You can select the appropriate GPU based on y...

NVIDIA B300 Technology In-Depth Analysis: Architectural Innovation and Enterprise AI Arithmetic Enabling Value

As generative AI evolves toward multimodal capabilities and models with trillions of parameters, and as enterprises’ computing needs shift from “general-purpose computing” to “scenario-specific, precision computing,” NVI...

RTX 5090 Technology Analysis and Enterprise Application Enablement: The Value of Arithmetic Innovation in Four Core Areas

Against the backdrop of enterprise AI R&D delving into models with hundreds of billions of parameters, professional content creation pursuing ultra-high-definition real-time processing, and industrial manufacturing r...

Arithmetic Leasing Selection Alert: A Guide to Avoiding the Three Core Pitfalls | 猿界算力

As digital transformation accelerates, computing power—a core factor of productivity—has become a critical pillar supporting corporate R&D innovation and business expansion. With the rapid expansion of the computing...

Low Latency-High Throughput: How Bare Metal GPUs Reconfigure the HPC and AI Convergence Arithmetic Base

When weather forecasting requires AI models to optimize the accuracy of numerical simulations, when biomedical R&D relies on HPC computing power to analyze molecular structures and uses AI to accelerate drug screenin...