Moore Thread 'KUAE Intelligent Computing Cluster KUAE' version 1.2 released: supports 64K long text, adds LLaMA2 full series of large models, etc.-It Industry-php.cn

This site reported on August 19 that version 1.2 of Moore’s thread “KUAE Intelligent Computing Cluster” was officially released. This version optimizes the comprehensiveness of the software and hardware levels, supports 64K long text, adds a full range of LLaMA2 large models, Baichuan, Yayi, Qwen2, Mixtral (MoE 8x7B) and other models.

摩尔线程“夸娥智算集群 KUAE” 1.2 版本发布：支持 64K 长文本、新增 LLaMA2 全系列大模型等

Update content:

MFU improvement
- Using kilocard cluster to train 100 billion model, MFU increased by 10%.
- Dense model cluster training MFU reaches up to 55%.
Flash Attention2 optimization
- Integrate the optimized Flash Attention2 technology of the MUSA SDK platform to improve large model training efficiency and resource utilization.
64K long text support
- Enhance support for long text large model training and optimize the ability to handle long text understanding and generation tasks.
Supports hybrid expert model MoE
- All2All optimization, optimizes matrix operations under different shapes for muDNN operators, and supports MoE large model training.
Continue training at breakpoints
- Improve checkpoint reading and writing performance and improve training efficiency.
Optimize DeepSpeed
- Adapt DeepSpeed and Ulysses to Moore thread GPU clusters to enhance long text training support.
- Suitable for many large models at home and abroad.
Improved stability
- Mature software and hardware, achieving 15 consecutive days of trouble-free training.
- Introducing KUAE Aegis reliability function to strengthen monitoring, automatic diagnosis and fault recovery capabilities.
Visualization/observability
- Introducing the PerfSight performance monitoring system to display resource consumption and performance data during the training process in real time.
New large models added to the built-in model library
- Added LLaMA2 full series, Baichuan, Yayi, Qwen2, Mixtral (MoE 8x7B) and other models.

The above is the detailed content of Moore Thread 'KUAE Intelligent Computing Cluster KUAE' version 1.2 released: supports 64K long text, adds LLaMA2 full series of large models, etc.. For more information, please follow other related articles on the PHP Chinese website!