Scaling Law 持續見效,讓算力就快跟不上大模型的膨脹速度了。 「規模越大、算力越高、效果越好」成為產業圭皋。主流大模型從百億跨越到 1.8 兆參數只花了1年,META、Google、微軟這些巨頭也從 2022 年起就在搭建 15,000 卡以上的超大集群。 「萬卡已然成為 AI 主戰場的標配。」
7 月 3 日,摩爾線程在上海重磅宣布其 AI 旗艦產品誇娥(KUAE)智算集群解決方案實現重大升級,從當前的千卡級別大幅擴展至萬卡規模。 摩爾線程誇娥(KUAE)萬卡智算集群,以全功能GPU 為底座,旨在打造國內領先的、能夠承載萬卡規模、具備萬P 級浮點運算能力的國產通用加速計算平台,專為萬億參數等級的複雜大模型訓練而設計。這項里程碑式的進展,樹立了國產GPU 技術的新標桿,有助於實現國產智算集群運算能力的全新跨越,將為我國人工智慧領域技術與應用創新、科研攻堅和產業升級提供堅實可靠的關鍵基礎設施。
此外,摩爾線程聯合中國行動通訊集團青海有限公司、中國聯通青海公司、北京德道信科集團、中國能源建設股份有限公司總承包公司、桂林華煩惱大數據科技有限公司(排名不分先後)分別就三個萬卡集群專案進行了策略簽約,多方聚力共同建構好用的國產GPU 集群。
摩爾線程創始人兼CEO 張建中表示:「當前,我們正處於生成式人工智慧的黃金時代,技術交織催動智能湧現,GPU 成為加速新技術浪潮來臨的創新引擎。摩爾線程矢志投身於這一歷史性的創造進程,致力於向全球提供加速運算的基礎設施和一站式解決方案,為融合人工智慧和數位孿生的數智世界打造先進的加速運算平台。智算集群作為摩爾線程全棧AI 戰略的一塊重要拼圖,可為各行各業數智化轉型提供澎湃算力,不僅有力彰顯了摩爾線程在技術創新和工程實踐上的實力,更將成為推動AI產業發展的新起點。來看,幾種演進趨勢值得關注,使得其對算力的核心需求也愈發明晰。
首先,Scaling Law 將持續奏效。 Scaling Law 自2020 年提出以來,已揭示了大模型發展背後的“暴力美學”,即透過算力、演算法、數據的深度融合與經驗積累,實現模型性能的飛躍,這也成為業界公認的將持續影響未來大模型的發展趨勢。 Scaling Law 將持續奏效,需要單點規模夠大且通用的算力才能快速跟上技術演進。
其次,Transformer 架構無法實現大一統,和其他架構會持續演進並共存,形成多元化的技術生態。 生成式 AI 的演化並非僅依賴規模的簡單膨脹,技術架構的革新同樣至關重要。 Transformer 架構雖然是當前主流,但新興架構如 Mamba、RWKV 和 RetNet 等不斷刷新運算效率,加速創新速度。隨著技術迭代與演進,Transformer 架構並不能實現大一統,從稠密到稀疏模型,再到多模態模型的融合,技術的進步都展現了對更高效能運算資源的渴望。
同時,AI、3D 和HPC 跨技術與跨領域融合不斷加速,推動著空間智能、物理AI 和AI 4Science、世界模型等領域的邊界拓展,使得大模型的訓練和應用環境更加複雜多元,市場對於能夠支援AI+3D、AI + 物理模擬、AI + 科學計算等多元運算融合發展的通用加速運算平台的需求日益迫切。
Under the diverse trends, Wanka has become the standard for the main battlefield of AI model training. As the amount of computing continues to increase, large model training urgently requires a super factory, a "large and universal" accelerated computing platform to shorten training time and achieve rapid iteration of model capabilities. Currently, international technology giants are actively deploying computing clusters with a scale of 1,000 cards or even more than 10,000 cards to ensure the competitiveness of large model products. As the number of model parameters moves from hundreds of billions to trillions, model capabilities become more generalized, and large models’ demands for underlying computing power further escalate. Wanka or even super-10,000ka clusters have become the ticket to this round of large model competition.
However, building a Wanka cluster is not a simple stacking of 10,000 GPU cards, but a highly complex super system project. It involves many technical problems such as ultra-large-scale networking interconnection, efficient cluster computing, long-term stability and high availability. This is a difficult but correct thing to do. Moore Thread hopes to build an accelerated computing platform with a scale of over 10,000 cards and a universal scenario, and prioritize solving the problem of large model training.
Kuae: Domestic Wanka 10,000P trillion large model training platform
Kuae (KUAE) is the full-stack solution of Moore Thread Intelligent Computing Center. It is based on a full-featured GPU and integrates software and hardware. A comprehensive and complete system-level computing power solution, including infrastructure with Kua'e computing cluster as the core, Kua'e cluster management platform (KUAE Platform) and Kua'e large model service platform (KUAE ModelStudio), aiming at integrated delivery This method solves the construction and operation management problems of large-scale GPU computing power.
Super large computing power, Wanka Wanka P: In terms of cluster computing performance, the new generation of Kuae intelligent computing cluster achieves single The cluster size exceeds 10,000 cards, and the floating-point computing power reaches 10Exa-Flops, which greatly improves the computing performance of a single cluster and can provide a solid computing power foundation for training large models with trillions of parameters. At the same time, in terms of GPU memory and transmission bandwidth, Kua'e Wanka cluster has reached PB-level ultra-large total graphics memory capacity, PB-level ultra-high-speed inter-card interconnection total bandwidth per second, and PB-level ultra-high-speed node interconnection total bandwidth, realizing computing Systematic collaborative optimization of power, video memory and bandwidth to comprehensively improve cluster computing performance.
Ultra-high stability, monthly long-term stable training: Stability is the key to measuring the performance of a super-10,000-card cluster. In terms of cluster stability, Moore Thread boasts that the average trouble-free running time of the Wanka cluster is more than 15 days, and it can achieve stable training of large models for more than 30 days. The average weekly training efficiency target can reach more than 99%, far exceeding the industry Average. This is due to a series of predictable and diagnosable multi-level reliable mechanisms independently developed by Moore Threads, including: automatic location and diagnostic prediction of software and hardware faults to achieve minute-level fault location, and Checkpoint multi-level storage mechanism to achieve second-level memory storage And the minute-level recovery of training tasks and the highly fault-tolerant and high-performance Wanka cluster management platform realize second-level management allocation and job scheduling.
Extreme optimization, ultra-high MFU: MFU is a common indicator for evaluating the training efficiency of large models, which can directly reflect the end-to-end cluster training efficiency. The Kua'e Wanka cluster has been optimized in terms of system software, framework, and algorithms, with an effective computing efficiency (MFU) target of up to 60%, reaching international levels. Among them, at the system software level, based on technical means such as extreme computing and communication efficiency optimization, the execution efficiency and performance of the cluster are greatly improved. At the framework and algorithm level, Kua'e Wanka cluster supports a variety of adaptive hybrid parallel strategies and efficient memory optimization. It can select and automatically configure the optimal parallel strategy according to the application load, greatly improving training efficiency and memory utilization. At the same time, for large models with very long sequences, Kua'e Wanka cluster uses optimization technologies such as CP parallelism and RingAttention to effectively reduce computing time and memory usage, and greatly improve cluster training efficiency.
Versatile and universal, eco-friendly: Kua'e Wanka cluster is a general accelerated computing platform with computing capabilities designed for general scenarios and can accelerate different architectures such as LLM, MoE, multi-modal, Mamba, etc. , large models of different modalities. At the same time, based on the efficient and easy-to-use MUSA programming language, complete CUDA compatibility and automated migration tool Musify, it accelerates the "Day0" level migration of new models, realizes ecological adaptation "Instant On", and helps customers quickly go online.
Everyone is one, building a large model application ecosystem
The construction of Wanka cluster requires the concerted efforts of the industry to achieve the rapid implementation of large model innovative applications and let domestic computing power "Built for use". At the press conference, Moore Thread joined hands with China Mobile Communications Group Qinghai Co., Ltd., China Unicom Qinghai Company, Beijing Dedao Xinke Group, China Energy Construction Co., Ltd. General Contracting Company, Guilin Huajue Big Data Technology Co., Ltd. (in no particular order ), respectively conducted strategic contracts on the Qinghai Zero Carbon Industrial Park Wanka Cluster Project, the Qinghai Plateau Kua'e Wanka Cluster Project, and the Guangxi ASEAN Wanka Cluster Project.
With the help of Moore Thread’s advanced Kua’e full-stack intelligent computing solution, all parties will work together to build a powerful national industrial and intelligent computing platform to accelerate the digital transformation and high-quality development of the industry. The Kua'e Wanka smart computing cluster project marks another major development in domestic AI computing power infrastructure and will inject new vitality into the development of digital economies in various places.
osa Moore Thread has strategically signed contracts with China Unicom Qinghai Company and Beijing Dedao Xinke Group中国 Moore threads and China Energy Construction Co., Ltd. General Contracting Company and Guilin Huasheng Big Data Technology Co., Ltd. After the strategic signing conference, I did not ask the core dome, Qingcheng Jizhi, 360, Jingdong Yun, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhi Zhi, Zhizhi, Zhizhi, Zhi Zhi, Zhizhi, Zhizhi, Zhizhi, and Zhi Zhi. Representatives from five partners, including Square, took the stage one after another to share how the Moore Thread Kua'e Intelligent Computing Cluster helps them innovate in different scenarios and fields such as large model training, large model reasoning, and embodied intelligence, demonstrating the role of the Kua'e Intelligent Computing Cluster. Huge potential and wide applicability in practical applications.
以上是AI主戰場,萬卡是標配:國產GPU萬卡萬P集群來了!的詳細內容。更多資訊請關注PHP中文網其他相關文章!