Meta recently launched an AI sound generation model called Audiobox. This model can receive both voice and text input, and users can generate the required audio through voice and text description
It is reported that this model is based on the Voicebox AI model launched by Meta in June this year. It is said that Audiobox can generate various environmental sounds and natural conversational speech, and integrates audio generation and editing capabilities so that users can freely generate what they need. audio.
Meta said that generating high-quality audio requires a large number of audio libraries and deep domain knowledge, but these resources are difficult for the public to obtain. The company launched this model to lower the threshold for sound generation and make it easier for anyone to produce videos. , sound effects for application scenarios such as games.
IT House found that this Audiobox model is based on Voicebox’s “guided sound” mechanism to facilitate the generation of target audio, and cooperates with the “flow-matching” diffusion model generation method to achieve “sound filling ( audio infilling)" function to generate multi-layered audio.
Meta test generates rain audio with thunderstorm sounds, and inputs a series of prompt sentences for demonstration, such as "the sound of running water is accompanied by birdsong", "a young woman speaking in a high-pitched and fast rhythm", etc.; at the same time, the test also Enter a human voice and text prompts to generate speech with emotion ("sad and slow") and background sound (being in a church).
Meta claims that Audiobox successfully defeated AudioLDM2, VoiceLDM and TANGO in terms of sound quality and "accuracy of generated content", surpassing the best existing audio generation models.
Audiobox is currently open to specific researchers and academics for trial use to test the quality and safety of the model. Meta claims that they plan to "make the model fully public to the public in a few weeks."
The above is the detailed content of Meta launches AI audio model Audiobox, supporting simultaneous voice and text input. For more information, please follow other related articles on the PHP Chinese website!