Machine Power Report
Editor: Yang Wen
This new AI voice model, Fish Speech, has an excellent imitation tone.
Recently, the AI voice track has suddenly become lively.
More than a month ago, ChatTTS, known as the "ceiling level of open source voice TTS", became popular.
How popular is it?
In just three days, it collected 9.2k stars on GitHub, and once topped the list of GitHub Trending and continued to dominate the list.
Not long after, Byte also launched a similar project, Seed-TTS, with the same slogan of "generating natural and real speech".
In the past few days, a new player has entered this track - Fish Speech.
It is reported that after 150,000 hours of data training, the model has become proficient in three languages: Chinese, English and Japanese. Its speech processing is close to human level, and its support for Chinese is even better -
The official can’t help but say Shao demo——
Chinese sentence: The lights of the world are reflected in the lake, and her desire makes the still water ripple. If the price is only loneliness, then let this desire flow freely. It flows into the world she is looking at, and into her eyes as clear as lake water.
Zhongli, The Power of Machines, 15 seconds
Video link://m.sbmmt.com/link/e056e52c8dcd019a63e6a3f169892cc9
English sentence:In the realm of advanced technology, evolution of artificial intelligence stands as a monumental achievement. This dynamic field, constantly pushing the boundaries of what machines can do, has seen rapid growth and innovation. From deciphering complex data patterns to driving cars autonomously, AI's applications are vast and diverse.
Speak English, The Power of Machines, 25 seconds
Video link://m.sbmmt.com/link/e056e52c8dcd019a63e6a3f169892cc9
Many netizens said: Although it is a bit electronic, the effect is already very good , and the tone of voice will not make people feel uncomfortable.
However, some netizens reminded that although this project is open source, it is not commercially available.
-1-
Explaining documentaries and telling tongue twisters, does it work?
Fish Speech is an open source text-to-speech model developed by Fish Audio Company. According to reports, the model has only 100 million parameters and can be easily run and fine-tuned on personal devices.
Official website link: https://fish.audio/zh-CN/text-to-speech/
The official website interface design is simple. In the "Discover" column, there are various voices trained by netizens, such as Ding Zhen , Trump, Lei Jun, Deng Ziqi, Dong Yuhui, Shan Tianfang, etc., as well as two-dimensional voices such as AD Senior Sister, Liuying, etc.
Next, let’s do the actual evaluation.
首先是另類解說《動物世界》。
前不久,有個 00 後部落客 @維 C 動物園,以發瘋的方式另類解說《動物世界》而出圈。
例如,在《鴞張跋扈》這一集中,部落客以一分正經、兩分清奇、三分幽默、四分莫名其妙的解說方式,介紹了一種叫做穴小鴞的動物。
影片連結://m.sbmmt.com/link/e056e52c8dcd019a63e6a3f169892cc9
我們就用 Fish speech 中的“紀錄片來寫這個有大病,我們就用 Fish speech”中的故事。
綠螳螂其實非常可愛,可愛死了,嘎嘣脆,雞肉味,但這一切都與美洲銻無關,因為它也自身難保,黃腹隼表示真香。黃腹隼遍佈南美洲各地,它們的視力極佳,能看到10公分以外的事物,所以我們今天的主角,不是它。
穴小鴞(xiao),江湖人稱鴞鮮肉,跟我表哥一樣,身高不到 30 厘米,十分可愛。正所謂「虎落平陽被犬欺,鴞在野外不如雞」,穴小鴞常因捕食能力太差,而被鄰居嘲笑。但咱不氣餒,既然找不到食物,就去找食物的食物。
我們又選用丁真、鄧紫棋的聲音來說繞口令。
視訊連結://m.sbmmt.com/link/e056e52c8dcd019a63e6a3f169892cc9
請川普說英文繞口令。
If you understand, say "understand". If you don't understand, say "don't understand". But if you understand and say "don't understand", that do I understand. Understyou under 📜
Fish speech 的模仿能力一絕,它可以模仿特定人物的音色、語調到以假亂真的程度,比如說單田芳、鄧紫棋、川普。 不過,它也有一些瑕疵,例如有時候它不識字,「穴小鴞」胡讀一通;不懂斷句,會把完整的句子讀得稀碎。此外,輸入的文字一旦太長,它就會罷工。
-2-三款 TTS 模型大亂鬥除了使用現成的語音外,我們還可以自己構建語音。
操作也很 easy。只需點擊網頁上方的「建構聲音」,即可跳轉至新介面。然後上傳封面、填寫聲音名稱、輸入音訊即可。
其中,在輸入音頻這個環節,我們既可以上傳現成的,也可以自己錄製,不過它對時長有限制,最好在 30 秒左右。 例如,我們上傳了一段徐志勝說脫口秀的音訊。
來看一下效果:李長庚最近有點煩。 他此刻騎在一隻老鶴身上,在雲霧裡穿梭,想入了神。眼看快飛到啟明殿,老鶴許是糊塗了,非但不減速,反而直直地撞了過去。李長庚回過神來,連連揮舞拂塵,它才急急一拍雙翅,歪歪斜斜地落在殿旁台階上。 Fish Speech讀小說
,機器之能,23秒
音色和徐志勝不能說毫不相干,只能說一模一樣,連口音都很像。
我們也讓它與「開源語音 TTS 天花板等級」的 ChatTTS 、Seed-TTS 進行 PK。
中文文本 :好呀,哈哈哈哈哈,喜歡笑的人運氣都不會差哦,希望你每天笑口常開。 Fish Speech:
Fish Speech
,機器之能,11秒
ChatTTS:
ChatTTS
,機器之能,6秒試聽連結://m.sbmmt.com/link/e056e52c8dcd019a63e6a3f169892cc9
由於位元組的 Seed-TTS 還無法親自體驗,所以我們就用了它的官方範例。
Seed-TTS,機器之能,6秒
這三款TTS 模型各有千秋,如果非要給它們的實力排個序,Seed-TTS 的斷句、語音語調最自然,其次就是ChatTTS ,Fish Speech 雖然還有所欠缺,但它贏在可自訂音色上。
連結-
https://fish.audio/zh-CN/text-to-speech/
https://github.com/fishaudio/fish-speechal
🎟 ://chattts.com/https://bytedancespeech.github.io/seedtts_tech_report/
https://github.com/BytedanceSpeech/seed-tts-eval
The above is the detailed content of Actual test of the latest AI speech model: asking Trump and Ding Zhen to say tongue twisters can be said to be fake, but the sentences are broken into pieces.. For more information, please follow other related articles on the PHP Chinese website!