Table of Contents
Models and Methods
Home Technology peripherals AI Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.

Jun 27, 2023 pm 05:46 PM
ai paper

Generative AI has taken the artificial intelligence community by storm. Both individuals and enterprises have begun to be keen on creating related modal conversion applications, such as Vincent pictures, Vincent videos, Vincent music, etc.

Recently, several researchers from scientific research institutions such as ServiceNow Research and LIVIA have tried to generate charts in papers based on text descriptions. To this end, they proposed a new method of FigGen, and the related paper was also included as a Tiny Paper in ICLR 2023.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.Picture

Paper address: https://arxiv.org/pdf/2306.00800.pdf

Some people may ask, what is so difficult about generating the charts in the paper? How does this help scientific research?

Scientific research chart generation helps to disseminate research results in a concise and easy-to-understand way, and automatically generating charts can bring many advantages to researchers, such as saving time and energy without spending a lot of money. Take the effort to design a chart from scratch. In addition, designing visually appealing and easy-to-understand figures can make the paper accessible to more people.

However, generating diagrams also faces some challenges. It needs to represent complex relationships between discrete components such as boxes, arrows, and text. Unlike generating natural images, concepts in paper graphs may have different representations and require fine-grained understanding. For example, generating a neural network graph will involve an ill-posed problem with high variance.

Therefore, the researchers in this paper trained a generative model on a dataset of paper diagram pairs to capture the relationship between diagram components and the corresponding text in the paper. This requires dealing with varying lengths and highly technical text descriptions, different chart styles, image aspect ratios, and text rendering fonts, sizes, and orientation issues.

In the specific implementation process, the researchers were inspired by recent text-to-image results, used the diffusion model to generate charts, and proposed a potential diffusion to generate scientific research charts from text descriptions. Model - FigGen.

What are the unique features of this diffusion model? Let's move on to the details.

Models and Methods

The researchers trained a latent diffusion model from scratch.

First learn an image autoencoder to map images into compressed latent representations. The image encoder uses KL loss and OCR perceptual loss. The text encoder used for conditioning is learned end-to-end in the training of this diffusion model. Table 3 below shows the detailed parameters of the image autoencoder architecture.

The diffusion model then interacts directly in the latent space, performing data-corrupted forward scheduling while learning to exploit temporal and textual conditional denoising U-Net to recover from the process.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.

## As for the data set, the researchers used Paper2Fig100k, which consists of graph-text pairs in the paper and contains 81,194 training samples and 21,259 validation samples. Figure 1 below is an example of a diagram generated using text descriptions in the Paper2Fig100k test set.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.

Model details

First the images Encoder. In the first stage, the image autoencoder learns a mapping from pixel space to a compressed latent representation, making diffusion model training faster. The image encoder also needs to learn to map the latent image back to pixel space without losing important details of the diagram (such as text rendering quality).

To this end, the researchers defined a convolutional codec with a bottleneck that downsamples the image at factor f=8. The encoder is trained to minimize KL loss, VGG-aware loss, and OCR-aware loss with Gaussian distribution.

Second is the text encoder. Researchers have found that general-purpose text encoders are not suitable for graph generation tasks. Therefore they define a Bert transformer trained from scratch in the diffusion process, which uses an embedding channel of size 512, which is also the embedding size that regulates the cross-attention layers of U-Net. The researchers also explored changes in the number of transformer layers under different settings (8, 32, and 128).

Finally, there is the latent diffusion model. Table 2 below shows the network architecture of U-Net. We perform the diffusion process on a perceptually equivalent latent representation of the image, where the input size of the image is compressed to 64x64x4, making the diffusion model faster. They defined 1,000 diffusion steps and linear noise scheduling.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.

##Training details

To train the image autoencoder, the researchers used an Adam optimizer with an effective batch size of 4 samples and a learning rate of 4.5e−6, during which four 12GB NVIDIA V100 graphics cards were used. To achieve training stability, they warmup the model in 50k iterations without using the discriminator.

For training the latent diffusion model, the researchers also used the Adam optimizer, which has an effective batch size of 32 and a learning rate of 1e−4. When training the model on the Paper2Fig100k dataset, they used eight 80GB NVIDIA A100 graphics cards.

Experimental results

In the generation process, the researcher used a DDIM sampler with 200 steps and generated 12,000 samples to calculate FID, IS, KID and OCR-SIM1. Robust use of classifier-free guidance (CFG) to test hyperconditioning.

Table 1 below shows the results for different text encoders. It can be seen that large text encoders produce the best qualitative results and condition generation can be improved by increasing the size of the CFG. Although the qualitative samples are not of sufficient quality to solve the problem, FigGen has grasped the relationship between text and images.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.

Figure 2 below shows the additional FigGen samples generated when adjusting the Classifier-Free Guidance (CFG) parameters. The researchers observed that increasing the size of the CFG, which was also demonstrated quantitatively, led to improvements in image quality.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.Pictures

Figure 3 below shows more generation examples from FigGen. Pay attention to the variation in length between samples, as well as the technical level of the text description, which will closely affect the difficulty of the model to correctly generate understandable images.

Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.Picture

However, the researchers also admitted that although these generated charts cannot provide practical help to the authors of the paper, they still It can be regarded as a promising direction of exploration.

For more research details, please refer to the original paper.

The above is the detailed content of Paper illustrations can also be automatically generated, using the diffusion model, and are also accepted by ICLR.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Apr 21, 2025 pm 02:30 PM

In the volatile cryptocurrency market, investors are looking for alternatives that go beyond popular currencies. Although well-known cryptocurrencies such as Solana (SOL), Cardano (ADA), XRP and Dogecoin (DOGE) also face challenges such as market sentiment, regulatory uncertainty and scalability. However, a new emerging project, RexasFinance (RXS), is emerging. It does not rely on celebrity effects or hype, but focuses on combining real-world assets (RWA) with blockchain technology to provide investors with an innovative way to invest. This strategy makes it hoped to be one of the most successful projects of 2025. RexasFi

Global Asset launches new AI-driven intelligent trading system to improve global trading efficiency Global Asset launches new AI-driven intelligent trading system to improve global trading efficiency Apr 20, 2025 pm 09:06 PM

Global Assets launches a new AI intelligent trading system to lead the new era of trading efficiency! The well-known comprehensive trading platform Global Assets officially launched its AI intelligent trading system, aiming to use technological innovation to improve global trading efficiency, optimize user experience, and contribute to the construction of a safe and reliable global trading platform. The move marks a key step for global assets in the field of smart finance, further consolidating its global market leadership. Opening a new era of technology-driven and open intelligent trading. Against the backdrop of in-depth development of digitalization and intelligence, the trading market's dependence on technology is increasing. The AI ​​intelligent trading system launched by Global Assets integrates cutting-edge technologies such as big data analysis, machine learning and blockchain, and is committed to providing users with intelligent and automated trading services to effectively reduce human factors.

The global unkillable token market has begun the new moon in April 2025 healthy and strong The global unkillable token market has begun the new moon in April 2025 healthy and strong Apr 20, 2025 pm 06:21 PM

DefiDungeons, an NFT series in a fantasy placement RPG game powered by the Solana blockchain, showed strong market performance at the beginning of April 2025. The global NFT market started strongly in April after weeks of growth in trading volume. In the past 24 hours, the total NFT transaction volume reached US$14 million, with trading volume soaring 46% from the previous day. Here are the hottest NFT series in the first week of April: DefiDungeonsNFT series: This fantasy placement RPG game NFT on the Solana blockchain became the best-selling NFT series in the first week of April. In the past 24 hours, the transaction volume reached US$1.5 million, with a total market value of more than US$5 million.

Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Apr 21, 2025 am 08:57 AM

Factors of rising virtual currency prices include: 1. Increased market demand, 2. Decreased supply, 3. Stimulated positive news, 4. Optimistic market sentiment, 5. Macroeconomic environment; Decline factors include: 1. Decreased market demand, 2. Increased supply, 3. Strike of negative news, 4. Pessimistic market sentiment, 5. Macroeconomic environment.

Web3 trading platform ranking_Web3 global exchanges top ten summary Web3 trading platform ranking_Web3 global exchanges top ten summary Apr 21, 2025 am 10:45 AM

Binance is the overlord of the global digital asset trading ecosystem, and its characteristics include: 1. The average daily trading volume exceeds $150 billion, supports 500 trading pairs, covering 98% of mainstream currencies; 2. The innovation matrix covers the derivatives market, Web3 layout and education system; 3. The technical advantages are millisecond matching engines, with peak processing volumes of 1.4 million transactions per second; 4. Compliance progress holds 15-country licenses and establishes compliant entities in Europe and the United States.

What are the top ten platforms in the currency exchange circle? What are the top ten platforms in the currency exchange circle? Apr 21, 2025 pm 12:21 PM

The top exchanges include: 1. Binance, the world's largest trading volume, supports 600 currencies, and the spot handling fee is 0.1%; 2. OKX, a balanced platform, supports 708 trading pairs, and the perpetual contract handling fee is 0.05%; 3. Gate.io, covers 2700 small currencies, and the spot handling fee is 0.1%-0.3%; 4. Coinbase, the US compliance benchmark, the spot handling fee is 0.5%; 5. Kraken, the top security, and regular reserve audit.

The list of compliance licenses for the top ten blockchain exchanges. What are the platforms for strict supervision selection? The list of compliance licenses for the top ten blockchain exchanges. What are the platforms for strict supervision selection? Apr 21, 2025 am 08:12 AM

Binance, OKX, Coinbase, Kraken, Huobi, Gate.io, Gemini, XBIT, Bitget and MEXC are all strictly selected platforms for regulatory selection. 1. Binance adapts to supervision by setting up a compliance sub-platform. 2. OKX strengthens compliance management through KYB verification. 3.Coinbase is known for its compliance and user-friendliness. 4.Kraken provides powerful KYC and anti-money laundering measures. 5. Huobi operates in compliance with the Asian market. 6.Gate.io focuses on transaction security and compliance. 7. Gemini focuses on security and transparency. 8.XBIT is a model of decentralized transaction compliance. 9.Bitget continues to work in compliance operations. 10.MEXC passed

Keep up with the pace of Coinjie.com: What is the investment prospect of crypto finance and AaaS business Keep up with the pace of Coinjie.com: What is the investment prospect of crypto finance and AaaS business Apr 21, 2025 am 10:42 AM

The investment prospects of crypto finance and AaaS businesses are analyzed as follows: 1. Opportunities of crypto finance include market size growth, gradual clear regulation and expansion of application scenarios, but face market volatility and technical security challenges. 2. The opportunities of AaaS business lie in the promotion of technological innovation, data value mining and rich application scenarios, but the challenges include technical complexity and market acceptance.

See all articles