Community Learn Tools Library Leisure

English

Home > Technology peripherals > AI > Easy and Efficient Transformer (NetEase's ultra-large model online inference engine)

Easy and Efficient Transformer (NetEase's ultra-large model online inference engine)

王林

Release： 2024-01-24 10:45:05

forward

438 people have browsed it

Easy and Efficient Transformer（网易超大模型线上推理引擎）

NetEase’s open source inference acceleration framework for transformer-based models supports single-card high-performance inference of tens of billions of models on the mid- to low-end Ampere architecture.

Project Background

Transformer-based large-scale models have proven effective in a variety of tasks in many fields. However, applying it to industrial production requires considerable effort to reduce the inference cost. To fill this gap, we propose a scalable inference solution: Easy and Efficient Transformer (EET). EET is a system that includes a series of Transformer reasoning optimizations at the algorithm and implementation levels. By optimizing the calculation and data processes of Transformer, EET can significantly reduce the cost of inference and improve the efficiency and performance of the model. Our experimental results show that EET can significantly improve inference speed and resource utilization without losing model accuracy, providing a simple and effective solution for large-scale model applications in industrial production.

First, we design a highly optimized kernel for long inputs and large hidden sizes.

In addition, we also propose a flexible CUDA memory manager to reduce the memory footprint when deploying large models. Compared with the state-of-the-art Transformer inference library (Faster Transformer v4.0), EET is able to achieve an average 1.40-4.20x decoding layer acceleration on the A100 GPU.

Paper address

https://arxiv.org/abs/2104.12470

Github address

https://github.com/NetEase-FuXi /EET

The above is the detailed content of Easy and Efficient Transformer (NetEase's ultra-large model online inference engine). For more information, please follow other related articles on the PHP Chinese website!

Related labels：

academic papers

source：163.com

Previous article：How does the self-attention mechanism use random sampling to improve the training and generalization capabilities of artificial intelligence models? Next article：What are the origins and applications of RLHF technology in language models?

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

Create the Future: Java Programming for Absolute Beginners

2024-10-13 13:32:21
You're Not Alone: Master Python with a Supportive Community by Your Side

2024-10-12 11:58:51
From Novice to Coder: Harness the Power of Python Programming

2024-10-11 20:06:51
Think Like a Programmer: Learning the Fundamentals of Java

2024-10-11 18:59:31
Java Made Simple: A Beginner's Guide to Programming Power

2024-10-11 18:30:51
Build a Blog with PHP: A Beginner-Friendly Project

2024-10-11 15:51:51
Speak the Language of Systems: Learn C, One Line at a Time

2024-10-11 15:42:10
Data Structures and Algorithms in C: A Beginner-Friendly Approach

2024-10-11 14:41:20
Coding Without Tears: Learning C the Easy Way

2024-10-11 14:08:31
Data Analysis with Java: A Beginner's Guide to Processing Information

2024-10-11 13:42:21

Latest Issues

Is there a way to force the text in the flexbox to be vertically centered, no matter what other CSS code we have? I have the following CSS code that is part of a larger CSS code used in a website I'm deve...

From 2024-04-06 20:41:51

0

1

518

Compare Imagick, Adobe Photoshop and Windows - Detect resolution (pixels per centimeter or pixels per inch) This is not a question but an attempt to improve my understanding of how DPI is stored in ...

From 2024-04-06 16:01:27

0

1

397

How to use @can correctly in blade template using strategy I am unable to create @can() in blade template as the documentation suggests this is my po...

From 2024-04-04 13:51:55

0

1

344

Translate "Remove public directory" in the Laravel project into Chinese as "Remove public directory" I'm making a Laravel project but I have a big problem, I can't remove the /public/ name fr...

From 2024-04-03 19:26:05

0

1

355

Looking for an efficient way to handle a large number of repeated If statement checks When I'm trying to create a search tool for a database I created for my undergraduate thes...

From 2024-04-03 19:20:08

0

1

297

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template