How to maximize GPU performance-AI-php.cn

How to maximize GPU performance

WBOY

Release： 2023-08-31 17:09:09

forward

1323 people have browsed it

The default way to speed up artificial intelligence projects is to increase the size of the GPU cluster. However, as GPU supply becomes increasingly tight, costs are getting higher and higher. It’s understandable that many AI companies spend more than 80% of the capital raised on computing resources. GPUs are key to AI infrastructure and should be allocated as much of the budget as possible. However, beyond these high costs, there are other ways to improve GPU performance that need to be considered, and it is increasingly urgent to expand GPU clusters

How to maximize GPU performance

Not an easy task, especially as the violent expansion of generative AI leads to a shortage of GPUs. NVIDIA A100 GPUs were among the first GPUs affected and are now extremely scarce, with some versions having lead times of up to a year. These supply chain challenges have forced many to consider the higher-end H100 as an alternative, but obviously at a higher price. For entrepreneurs investing in their own infrastructure to create the next great generative AI solution for their industry, there is a need to squeeze every drop of efficiency out of existing GPUs

Let’s take a look at how enterprises can get more out of their compute investment by proposing changes to the network and storage design of their AI infrastructure

Data Matters

Optimizing the utilization of existing computing infrastructure is an important approach. In order to maximize GPU utilization, the problem of slow data transfer speeds needs to be solved to ensure that the GPU remains running under high load. Some users are experiencing GPU utilization of only 20%, which is unacceptable. As a result, AI teams are looking for the best ways to maximize the return on their AI investments

GPUs are the engine of AI. Just like a car engine needs gasoline to run, a GPU needs data to perform operations. If you limit the data flow, you will limit the performance of the GPU. If a GPU is only 50% efficient, the productivity of the AI team will decrease, the time it takes to complete a project will double, and the return on investment will be halved. Therefore, in infrastructure design, it is important to ensure that the GPU can operate at maximum efficiency and provide the expected computing performance

It is important to note that both the DGX A100 and H100 servers have up to 30 TB of internal storage capacity. However, considering that the average model size is approximately 150 TB, this capacity is insufficient for most deep learning models. Therefore, additional external data storage is required to provide data to the GPU

Storage Performance

AI storage is typically composed of servers, NVMe SSDs, and storage software Composed, they are usually packaged in a simple device. Just like GPUs are optimized to process large amounts of data in parallel with tens of thousands of cores, storage also needs to be high-performance. In artificial intelligence, the basic requirement for storage is to be able to store the entire data set and transfer the data to the GPU at line speed (i.e., the fastest speed the network allows) to keep the GPU running efficiently and saturated. Anything less results in a waste of these very expensive and valuable GPU resources. By delivering data with a speed that can keep up with a cluster of 10 or 15 GPU servers running at full speed, it helps Optimize GPU resources and improve performance across your environment while leveraging your budget as much as possible to get the most from your entire infrastructure

The challenge is, in fact, that there is no storage optimized for AI Providers require many client compute nodes to extract full performance from storage. If you start with one GPU server, you will in turn need many storage nodes to achieve the performance to provision a single GPU server.

Rewritten content: Don’t trust all benchmark results; you can easily get more bandwidth when using multiple GPU servers, but AI relies on storage, no matter what It delivers all performance to a single GPU node whenever needed. Stick with storage that can deliver the ultra-high performance you need, but do it in a single storage node and be able to deliver this performance to a single GPU node. This may limit the market reach, but it is a priority when starting your AI project journey

Network Bandwidth

Increasingly powerful computing power is driving increasing demand for other artificial intelligence infrastructure. Bandwidth requirements have reached new heights, being able to manage the vast amounts of data being sent over the network from storage devices and processed by GPUs every second. Network adapters (NICs) in the storage device connect to switches in the network, which connect to adapters inside the GPU server. NICs can connect storage directly to NICs in 1 or 2 GPU servers without bottlenecks if configured correctly, ensuring bandwidth is high enough to pass the maximum data load from storage to GPUs for a sustained period of time Maintaining saturation within the GPU is key, and in many cases, failure to do this is why we see lower GPU utilization.

GPU Orchestration

Once the infrastructure is in place, GPU orchestration and allocation tools will greatly help teams assemble and allocate resources more efficiently, Understand GPU usage, providing a higher level of resource control, reducing bottlenecks and improving utilization. These tools can only accomplish all of these tasks as expected if the underlying infrastructure can ensure the correct flow of data

In the field of artificial intelligence, data is the key input. Therefore, traditional enterprise flash is not relevant to AI when used for enterprise mission-critical applications (e.g., inventory control database servers, email servers, backup servers). These solutions are built using legacy protocols, and while they have been repurposed for AI, these legacy foundations limit their performance for GPU and AI workloads, drive up prices, and waste money on overly expensive and Unnecessary functions

With the current global shortage of GPUs, coupled with the rapid development of the artificial intelligence industry, finding ways to maximize GPU performance has never been more important— —especially in the short term. As deep learning projects flourish, these methods become several key ways to reduce costs and improve output

The above is the detailed content of How to maximize GPU performance. For more information, please follow other related articles on the PHP Chinese website!