The large language model training task based on GPT-3 set a new record: the NVIDIA H100 accelerator card took only 11 minutes-It Industry-php.cn

The large language model training task based on GPT-3 set a new record: the NVIDIA H100 accelerator card took only 11 minutes

PHPz

Release： 2023-06-28 21:02:02

forward

1175 people have browsed it

News on June 28, the booming development of AI technology has made NVIDIA’s graphics cards a hot product that has attracted much attention on the market. Especially the high-end H100 accelerator card, which sells for more than 250,000 yuan, but the market is in short supply. The performance of this accelerator card is also very amazing. The latest AI test results show that the large language model training task based on GPT-3 has set a new record, with a completion time of only 11 minutes.

基于GPT-3的大语言模型训练任务刷新记录：NVIDIA H100加速卡仅用11分钟

According to the editor’s understanding, MLCommons, an open industry alliance in the field of machine learning and artificial intelligence, has released the latest MLPerf benchmark evaluation. It includes 8 load tests, including the LLM large language model test based on the GPT-3 open source model, which puts forward high requirements for evaluating the AI performance of the platform.

The NVIDIA platform participating in the test consists of 896 Intel Xeon 8462Y processors and 3584 H100 accelerator cards. It is the only one among all participating platforms that can complete all tests. Moreover, the NVIDIA platform set a new record. In the key GPT-3-based large language model training task, the H100 platform took only 10.94 minutes. In comparison, the Intel platform built with 96 Xeon 8380 processors and 96 Habana Gaudi2 AI chips completed the same test. The time required is 311.94 minutes.

The performance of the H100 platform is almost 30 times that of the Intel platform. Of course, there is a big difference in the scale of the two platforms. But even if only 768 H100 accelerator cards are used for training, the time required is still only 45.6 minutes, far longer than the AI chip using the Intel platform.

The H100 accelerator card uses the GH100 GPU core, manufactured with a customized TSMC 4nm process, and has 80 billion transistors. It integrates 18432 CUDA cores, 576 tensor cores and 60MB of secondary cache, and supports 6144-bit HBM high-bandwidth memory and PCIe 5.0 interface.

基于GPT-3的大语言模型训练任务刷新记录：NVIDIA H100加速卡仅用11分钟

The H100 compute card is available in SXM and PCIe 5.0 styles. The SXM version has 15,872 CUDA cores and 528 Tensor cores, while the PCIe 5.0 version has 14,952 CUDA cores and 456 Tensor cores. The card's power consumption can reach up to 700W.

In terms of performance, the H100 accelerator card can achieve 60 trillion calculations per second in FP64/FP32 calculations, and 2,000 trillion times per second in FP16 calculations. In addition, it also supports TF32 calculations, which can reach 1000 trillion times per second, which is three times that of A100. In terms of FP8 computing, the performance of the H100 accelerator card can reach 4,000 trillion operations per second, which is six times that of the A100.

The above is the detailed content of The large language model training task based on GPT-3 set a new record: the NVIDIA H100 accelerator card took only 11 minutes. For more information, please follow other related articles on the PHP Chinese website!