According to news on September 20, during the Huawei Full Connectivity Conference 2023 held today, Wang Tao, Huawei’s Managing Director, Director of the ICT Infrastructure Business Management Committee, and President of Enterprise BG, officially released the new architecture of the Ascend AI computing cluster——Atlas 900 SuperCluster can support large model training with over one trillion parameters.
According to reports, the new cluster uses the new Huawei Galaxy AI intelligent computing switch CloudEngine XH16800. With its high-density 800GE port capability, a two-layer switching network is enough Realize ultra-large-scale non-convergence cluster networking with 2250 nodes (equivalent to 18,000 cards).
The new cluster also uses an innovative super-node architecture, which greatly improves large model training capabilities. In addition, Huawei leverages its comprehensive advantages in computing, network, storage, energy and other fields to comprehensively improve system reliability from the device level, node level, cluster level and business level, and improve the stability of large model training from day-level to Month level .
In addition, Huawei has released the more open and easier-to-use CANN 7.0 heterogeneous computing architecture, which is not only fully compatible with the industry’s AI frameworks, acceleration libraries, and mainstream large models, but also deeply opens up underlying capabilities , allowing AI frameworks and acceleration libraries to call and manage computing resources more directly, enabling developers to customize high-performance operators, and giving large models differentiated competitiveness.
Huawei has also upgraded the Ascend C programming language, using a more efficient programming method to simplify operator implementation logic and greatly shorten the development of fusion operators. cycle, enabling the rapid development of AI models and applications.
For global enterprises and developers, Huawei Cloud official website officially launched the Shengteng AI Cloud Service "Modules and Thousands of States" special area today. The special area includes industry Mainstream open source large models are fully adapted and optimized based on Ascend AI cloud services; a tool chain for application development is provided, and all development tools have been cloud-based, eliminating cumbersome configuration processes and enabling one-click access and instant start-up. use.
▲ Shengteng AI Cloud Service Special Zone
Based on According to this site's query, as of July this year, Ascend AI cluster has supported the construction of artificial intelligence computing centers in 25 cities across the country, and the public computing power platforms of 7 cities have been selected as the first batch of national "new generation artificial intelligence public computing power openings". Innovation Platform"
Shengteng AI has developed more than 30 hardware partners and more than 1,200 ISVs, and has launched more than 2,500 industry AI solutions to provide large-scale services for operators, Internet, finance and other industries
The above is the detailed content of Launched new architecture Shengteng AI computing cluster to support large model training with over one trillion parameters. For more information, please follow other related articles on the PHP Chinese website!