How to create an open source model that can defeat GPT-4o? Regarding Llama 3.1 405B, Meta is written in this paper-AI-php.cn

How to create an open source model that can defeat GPT-4o? Regarding Llama 3.1 405B, Meta is written in this paper

PHPz

Release： 2024-07-24 18:42:03

Original

984 people have browsed it

After experiencing an “accidental leak” two days in advance, Llama 3.1 was finally officially released last night. Llama 3.1 extends the context length to 128K and is available in 8B, 70B and 405B versions, once again single-handedly raising the bar for competition on large model tracks. For the AI community, the most important significance of Llama 3.1 405B is that it refreshes the upper limit of the capabilities of the open source basic model. Meta officials said that in a series of tasks, its performance is comparable to the best closed source model. The table below shows how the current Llama 3 Series models perform on key benchmarks. It can be seen that the performance of the 405B model is very close to GPT-4o.

击败GPT-4o的开源模型如何炼成？关于Llama 3.1 405B，Meta都写在这篇论文里了

At the same time, Meta published the paper "The Llama 3 Herd of Models", revealing the research details of the Llama 3 series models so far.

击败GPT-4o的开源模型如何炼成？关于Llama 3.1 405B，Meta都写在这篇论文里了

- to B is using 8K context length After pre-training, continuous training is performed with 128K context length, supporting multiple languages and tool usage.

Meta enhances the preprocessing of the Llama model and the Curation pipelines of pre-training data, as well as the quality assurance and filtering methods of post-training data.

Meta believes that there are three key levers for the development of high-quality underlying models: data, scale and complexity management.

Data:

Scale:

Complexity Management:
Pre-training 405B on 15.6T tokens (3.8x10^25 FLOPs) was a major challenge, Meta optimized the entire training stack and used over 16K H100 GPUs.

As PyTorch founder and Meta Distinguished Engineer Soumith Chintala said, the Llama3 paper reveals a lot of cool details, one of which is the construction of the infrastructure.

1. During training, Meta improves the Chat model through multiple rounds of alignment, including supervised fine-tuning (SFT), rejection sampling, and direct preference optimization. Most SFT samples are generated from synthetic data.
1. Die Forscher haben beim Design mehrere Entscheidungen getroffen, um die Skalierbarkeit des Modellentwicklungsprozesses zu maximieren. Beispielsweise wurde eine standardmäßige dichte Transformer-Modellarchitektur mit nur geringfügigen Anpassungen anstelle einer Expertenmischung gewählt, um die Trainingsstabilität zu maximieren. Ebenso wird ein relativ einfaches Post-Training-Verfahren angewendet, das auf überwachter Feinabstimmung (SFT), Ablehnungsstichprobe (RS) und direkter Präferenzoptimierung (DPO) basiert, und nicht auf komplexeren Reinforcement-Learning-Algorithmen, die tendenziell weniger stabil sind und schwierigere Erweiterung.
2. Im Rahmen des Entwicklungsprozesses von Llama 3 entwickelte das Meta-Team auch multimodale Erweiterungen des Modells, die ihm Fähigkeiten in den Bereichen Bilderkennung, Videoerkennung und Sprachverständnis verleihen. Diese Modelle befinden sich noch in der aktiven Entwicklung und sind noch nicht zur Veröffentlichung bereit, aber der Artikel stellt die Ergebnisse vorläufiger Experimente mit diesen multimodalen Modellen vor.
3. Meta hat seine Lizenz aktualisiert, um Entwicklern die Nutzung der Ausgabe von Llama-Modellen zur Verbesserung anderer Modelle zu ermöglichen.
4. Am Ende dieses Artikels haben wir auch eine lange Liste von Mitwirkenden gesehen: fenye1 Diese Reihe von Faktoren hat heute schließlich die Llama 3-Serie hervorgebracht.
5. Natürlich ist die Verwendung des Lama-Modells im Maßstab 405B für normale Entwickler eine Herausforderung und erfordert viel Rechenressourcen und Fachwissen.
6. Nach dem Start ist das Ökosystem von Llama 3.1 bereit. Über 25 Partner bieten Dienste an, die mit dem neuesten Modell funktionieren, darunter Amazon Cloud Technologies, NVIDIA, Databricks, Groq, Dell, Azure, Google Cloud, Snowflake und mehr.
  Weitere technische Details finden Sie im Originalpapier.

The above is the detailed content of How to create an open source model that can defeat GPT-4o? Regarding Llama 3.1 405B, Meta is written in this paper. For more information, please follow other related articles on the PHP Chinese website!