Launched a free personalized academic paper recommendation system - the 'arXiv customized platform' of the top visual teams of German universities-AI-php.cn

Generate one image in 10 milliseconds and 6,000 images in 1 minute. What is the concept?

In the picture below, you can deeply feel the super power of AI.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Even, when you continue to add new elements to the prompts generated by the two-dimensional lady pictures, each The change of pictures in this style also flashes in an instant.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Pictures

Such an amazing real-time picture generation speed is the result of StreamDiffusion proposed by researchers from UC Berkeley, University of Tsukuba, Japan, etc. bring results.

This new solution is a diffusion model process that enables real-time interactive image generation at over 100fps.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Paper address: https://arxiv.org/abs/2312.12491

After being open sourced, StreamDiffusion directly dominated the GitHub rankings, garnering 3.7k stars.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

StreamDiffusion innovatively uses a batch processing strategy instead of sequence denoising, which is about 1.5 times faster than traditional methods . Moreover, the new residual classifier-free guidance (RCFG) algorithm proposed by the author can be 2.05 times faster than the traditional classifier-free guidance.

The most noteworthy thing is that the new method can achieve an image-to-image generation speed of 91.07fps on the RTX 4090.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

#In the future, StreamDiffusion will quickly generate in different scenarios such as the metaverse, video game graphics rendering, and live video streaming. Able to meet the high throughput requirements of these applications.

In particular, real-time image generation can provide powerful editing and creative capabilities for those who work in game development and video rendering.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Designed specifically for real-time image generation

Currently, in various fields, diffusion models The application needs a diffusion pipeline with high throughput and low latency to ensure the efficiency of human-computer interaction

A typical example is to use the diffusion model to create the virtual character VTuber - able to Respond fluidly to user input.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

In order to improve high throughput and real-time interaction capabilities, the current research direction is mainly focused on reducing denoising iterations The number of iterations, for example, is reduced from 50 iterations to a few, or even once.

A common strategy is to refine the multi-step diffusion model into several steps and reconstruct the diffusion process using ODEs. To improve efficiency, diffusion models have also been quantified.

In the latest paper, researchers started from the orthogonal direction and introduced StreamDiffusion, a real-time diffusion pipeline designed for high throughput of interactive image generation. design.

Existing model design work can be integrated with StreamDiffusion while also using N-step denoising diffusion models to maintain high throughput and provide users with more flexible options

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Real-time image generation｜First and second columns: examples of AI-assisted real-time drawing, third column: real-time rendering from 3D avatars 2D illustration. Columns 4 and 5: Live camera filters. Real-time image generation | The first and second columns show examples of AI-assisted real-time drawing, and the third column shows the process of generating 2D illustrations by rendering 3D avatars in real time. The fourth and fifth columns show the effect of real-time camera filters

How is it implemented?

StreamDiffusion Architecture

StreamDiffusion is a new diffusion pipeline designed to increase throughput.

It consists of several key parts:

Streaming batch processing strategy, residual classifier-free guidance (RCFG), input and output queue, random Model acceleration tools for Stochastic Similarity Filter, precomputation programs, and micro-autoencoders.

Batch denoising

In the diffusion model, the denoising steps are performed in sequence, which leads to the U-Net Processing time,increases proportionally to the number of steps.

However, in order to generate high-fidelity images, the number of steps has to be increased.

In order to solve the problem of high-latency generation in interactive diffusion, researchers proposed a method called Stream Batch.

As shown in the figure below, in the latest methods, instead of waiting for a single image to be completely denoised before processing the next input image, it accepts after each denoising step Next input image.

This forms a denoising batch, and the denoising steps for each image are staggered.

By concatenating these interleaved denoising steps into a batch, researchers can use U-Net to efficiently process batches of consecutive inputs.

The input image encoded at time step t is generated and decoded at time step t n, where n is the number of denoising steps.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Residual Classifier Free Guided (RCFG)

Common Classifier-free guidance (CFG) is a method that performs vector calculations between the unconditional or negative conditional term and the original conditional term. An algorithm to enhance the effect of the original condition.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

This can bring benefits such as enhancing the effect of the prompt.

However, in order to compute negative conditional residual noise, each input latent variable needs to be paired with a negative conditional embedding and passed to U-Net at each inference time.

To solve this problem, the author introduces an innovative residual classifier-free bootstrapping (RCFG)

This method utilizes virtual residual Noise is used to approximate the negative condition, so that we only need to calculate the negative condition noise in the initial stage of the process, thereby significantly reducing the additional U-Net inference calculation cost when embedding negative conditions

Input and output queue

#Convert the input image into a pipeline-manageable tensor data format, and in turn, convert the decoded tensor back to the output image, both Requires non-negligible additional processing time.

To avoid adding these image processing times to the neural network inference process, we separate image pre- and post-processing into different threads, thereby enabling parallel processing.

In addition, by using input tensor queues, it is also possible to cope with temporary interruptions in input images due to device failures or communication errors, allowing for smooth streaming.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities picture

Stochastic Similarity Filter

The following figure shows the core diffusion inference pipeline, including VAE and U-Net.

Improves the speed of the inference pipeline and enables real-time image generation by introducing denoising batching and pre-computed hint embedding cache, sampled noise cache and scheduler value cache.

Stochastic Similarity Filtering (SSF) is designed to save GPU power consumption and can dynamically close the diffusion model pipeline, thereby achieving fast and efficient real-time inference.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Precomputation

The U-Net architecture requires both input potential Variables also require conditional embedding.

Normally, conditional embedding is derived from "hint embedding" and remains unchanged between different frames.

To optimize this, the researchers pre-compute hint embeddings and store them in cache. In interactive or streaming mode, this precomputed hint embedding cache is recalled.

In U-Net, the calculation of keys and values for each frame is implemented based on pre-computed hint embeddings

Therefore, The researchers modified U-Net to store these key and value pairs so that they can be reused. Whenever the input prompt is updated, the researchers recompute and update these key and value pairs within U-Net.

Model Acceleration and Tiny Autoencoders

To optimize speed, we configured the system to use a static batch size and a fixed input size (height and width).

This approach ensures that the computation graph and memory allocation are optimized for the specific input size, resulting in faster processing.

However, this means that if you need to process images of different shapes (i.e. different heights and widths), use different batch sizes (including the batch size for the denoising step).

Experimental evaluation

Quantitative evaluation of denoising batches

## Figure 8 shows batch denoising and original sequential U- Efficiency comparison of Net loop

When implementing the batch denoising strategy, the researchers found significant improvements in processing time. This reduces the time in half compared to traditional U-Net loops with sequential denoising steps.

Even with the neural module acceleration tool TensorRT applied, the streaming batch processing proposed by the researchers can still significantly improve the efficiency of the original sequential diffusion pipeline in different denoising steps.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Additionally, the researchers compared the latest method with the AutoPipeline-ForImage2Image pipeline developed by Huggingface Diffusers.

The average inference time comparison is shown in Table 1. The latest pipeline shows that the speed has been greatly improved.

When using TensorRT, StreamDiffusion is able to achieve a 13x speedup when running 10 denoising steps. When only a single denoising step is involved, the speed increase can reach 59.6 times

Even without TensorRT, StreamDiffusion is 29.7 times faster than AutoPipeline when using single-step denoising. An 8.3x improvement when using 10-step denoising.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Table 2 compares the inference time of the flow diffusion pipeline using RCFG and conventional CFG.

In the case of single-step denoising, the inference time of Onetime-Negative RCFG and traditional CFG is almost the same.

So the inference time of One-time RCFG and traditional CFG during single-step denoising is almost the same. However, as the number of denoising steps increases, the inference speed improvement from traditional CFG to RCFG becomes more obvious.

In the fifth step of denoising, Self-Negative RCFG is 2.05 times faster than traditional CFG, and Onetime-Negative RCFG is 1.79 times faster than traditional CFG.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

After this, the researchers carried out the Energy consumption was comprehensively assessed. The results of this process can be seen in Figures 6 and 7

These figures demonstrate the application of SSF (setting the threshold eta to 0.98) to the input video to contain periodic static Comparative analysis of GPU usage patterns in characteristic scenes shows that when the input images are mainly static images and have a high degree of similarity, using SSF can significantly reduce GPU usage.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Ablation study

Different modules perform different denoising steps The impact on average inference time is shown in Table 3. As can be seen, the reduction of different modules is verified in the image-to-image generation process.

Pictures

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Qualitative results

are demonstrated in Figure 10 using the remaining Alignment process for fast conditional adjustment of generated images without classifier guidance (RCFG)

The generated images, without using any form of CFG, show weak alignment hints, especially in Aspects such as color changes or adding non-existent elements were not implemented efficiently.

In contrast, the use of CFG or RCFG enhances the ability to modify the original image, such as changing hair color, adding body patterns, and even including objects like glasses. Notably, the use of RCFG can enhance the impact of cues compared with standard CFG.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Finally, the quality of the standard text-to-image generation results is shown in Figure 11.

Using the sd-turbo model, you can generate high-quality images like the one shown in Figure 11 in just one step.

When using the flow diffusion pipeline and sd-turbo model proposed by the researcher in the environment of GPU: RTX 4090, CPU: Core i9-13900K, OS: Ubuntu 22.04.3 LTS When generating images, it is feasible to produce such high quality images at over 100fps.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Netizens got started, and a large wave of two-dimensional ladies came

The code of the latest project has been Open source, it has collected 3.7k stars on Github.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Project address: https://github.com/cumulo-autumn/StreamDiffusion

Many netizens have begun to generate their own two-dimensional wives.

Pictures

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities There are also real-time animations.

Pictures

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities 10x speed hand-drawn generation.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities ##Picture

Those who are interested in children's shoes, why not do it yourself.

Reference:

//m.sbmmt.com/link/f9d8bf6b7414e900118caa579ea1b7be

//m.sbmmt.com/link/75a6e5993aefba4f6cb07254637a6133

The above is the detailed content of Launched a free personalized academic paper recommendation system - the 'arXiv customized platform' of the top visual teams of German universities. For more information, please follow other related articles on the PHP Chinese website!