How does AI keep Dong Yuhui from getting off work?-AI-php.cn

"There are still 46 minutes left, Teacher Dong's vacation will be over."

This is a message with over 100 likes on Dong Yuhui's latest video.

During the days when he disappeared from the Oriental Selection live broadcast room, fans poured into his personal account and joked, "As long as this man takes a vacation, hundreds of thousands of people will fall out of love."

However, for the top anchors, no matter how dedicated they are to their posts, there will always be a time to go off the air.

After all, talking for hours on end, with witty remarks, consumes both mental energy and physical strength.

Under such circumstances, not only is "24-hour live broadcast" impossible, but even long-term chatting without shifts is not something everyone can withstand.

But having said that, what if there is machine support...

Especially with the explosion of virtual human and other technologies during this period, it is difficult not to open people's minds——

With AI capabilities, can the head anchor "himself" be stationed in the live broadcast room 24 hours a day?

Furthermore, is it possible to switch languages seamlessly and go international directly without Teacher Dong’s bilingual ability?

24-hour live broadcast, what is the difficulty?

Judging from the various AI technologies currently implemented, these "brain holes" are not impossible to achieve.

In terms of image technology, it is not difficult for AI to directly generate avatars or "change" faces for anchors.

For example, the fake "Tom Cruise" abroad has been popular on TikTok for a while. Domestic avatars such as Liu Yexi and Li Xinglan are also very popular on domestic social media platforms and are almost invisible in the video. With the shadow of “AI synthesis”, there are also many amazing voices in the comments.

How does AI keep Dong Yuhui from getting off work?

Not only that, the effects of AI-generated images and even videos are becoming increasingly sophisticated. Foreign OpenAI’s DALL·E2, Google’s latest Imagen and Parti, domestic Zhiyuan CogVideo and Microsoft Asia Research NUWA-Infinity, etc. are all new achievements that have emerged in the past few months.

Many of the above-mentioned image technologies have opened API interfaces or applied for trial. In addition, there are many similar open source models, which basically make it “playable by everyone”.

Based on these technologies, many AI bloggers with “24-hour live broadcast” have appeared on various platforms at home and abroad.

But when you click on it, you will find that these AI bloggers are far less popular than real anchors or virtual anchors played by real people.

How does AI keep Dong Yuhui from getting off work?

△ 24-hour AI virtual anchor, only 167 people "watched" it for half a day

The effect of the live broadcast is also the same as the "24-hour live broadcast" we expected "Live broadcast" is a bit far away:

When interacting, most AI anchors can do very limited things. Some can only simply sing a few songs (limited playlist), or respond according to set instructions, etc. ;

How does AI keep Dong Yuhui from getting off work?

When speaking, the timbre of the virtual anchor synthesized by AI is not only not as vivid as the real anchor, but also cannot actively create some emotional "surprises".

What this reflects is the pain point of most virtual AI anchors -

Although there have been continuous breakthroughs in image generation technology in recent years, the technical threshold of voice language AI is still high.

Take Dong Yuhui's live broadcast room as an example. Although it is not difficult to create an image of "AI Dong Yuhui" as long as Teacher Dong is willing;

However, it is difficult to let the "AI version" of Teacher Dong speak in a different tone It is still difficult to complete operations such as making the voice sound more like my own, recognizing the voices of other teachers in the live broadcast room, and even understanding the "instructions" of assistants outside the live broadcast room.

Behind this corresponds to the comprehensive capabilities of various speech language AI such as speech synthesis, voice recognition, and speech recognition.

Going one step further, if you want to make this live broadcast room international, you will also put forward higher requirements for voice capabilities.

How does AI keep Dong Yuhui from getting off work?

For example, at least AI subtitles that can be translated online in real time are required:

On this basis, if you want to make a barrier-free live broadcast room, you also need Further master the ability of simultaneous interpretation.

The good news is that now, more and more major technology manufacturers have noticed this track and have been increasing investment in recent years.

Major manufacturers at home and abroad have stepped up their efforts

Just from the perspective of theoretical research, there have been many papers in the direction of speech language AI.

Major companies such as Amazon and Google have published hundreds or even thousands of AI papers on conversational AI, NLP and language processing, many of which are top conference papers; Meta alone is in 2018 , and won the best paper from the two top NLP conferences, EMNLP and ACL...

How does AI keep Dong Yuhui from getting off work?

(Of course, there are also those who publish less papers, for example, Apple prefers to apply for patents)

Domestic companies such as BAT, Huawei, JD.com, etc. have also been established in recent years. His own acoustics or NLP laboratory has won various paper awards at many top conferences such as NAACL, AAAI and ACL.

How does AI keep Dong Yuhui from getting off work?

△ACL 2022 Partial Outstanding Paper Award

Take IWSLT (International Spoken Language Machine Translation Competition) as an example, which is the most influential spoken language competition in the world One of the machine translation competitions.

In this year’s competition, Huawei ranked TOP 1 in four language directions in three tasks: speech-to-speech translation, offline speech translation, and equal-length spoken translation.

How does AI keep Dong Yuhui from getting off work?

#But outside of research, major manufacturers have different ideas on the implementation of speech language AI technology.

In addition to optimizing their own products (voice assistants, search engines, etc.) based on the latest research, some manufacturers choose to directly open source their models or make them into AI frameworks for developers to call.

Such AI capabilities are “too esoteric” for many developers who have never been exposed to AI, and it is even difficult to figure out how and where it should be used.

To a certain extent, this has also resulted in many developers not having access to the latest speech and language AI technology.

Especially the simultaneous interpretation AI, which has been very popular in recent years, has certain requirements for real-time performance and model performance. More and more corresponding papers and workshops are appearing at top conferences.

For industries such as live broadcasting, simultaneous interpretation AI is also an indispensable technology in order to expand the audience and scope of influence.

So, is there a lower threshold way to implement it?

Now many manufacturers have begun to try a new method-

Taking Huawei as an example, it is aimed at mobile developers and is based on Huawei Mobile Core Services (HMS Core). A dedicated machine learning service (ML Kit) toolkit.

On this basis, developers can use these voice language technologies in the mobile apps or applications they develop without mastering the technical details of AI.

For example, the AI subtitles (online text translation) and simultaneous interpretation we just saw are easily achieved based on the speech language AI capabilities in Huawei's toolkit.

The development threshold is getting lower and lower

Having said so much, let’s take a look at how to get started and use it. Let’s take a look at what our predecessors have done.

For example, on the Huawei Developer Forum, someone developed a voice search shopping app for grandma based on real-time speech recognition, real-time speech transcription and other functions in ML Kit.

How does AI keep Dong Yuhui from getting off work?

The steps to implement the voice function are not complicated.

First, you need to do some development preparations, including: completing real-name registration on the Huawei Developer Alliance website, configuring AppGallery Connect, and configuring the Maven warehouse address of the HMS Core SDK in the project.

Then, integrate the relevant service SDK. Taking the real-time speech recognition service as an example, the code is as follows:

dependencies{
// 引入实时语音识别服务插件
implementation 'com.huawei.hms:ml-computer-voice-asr-plugin:3.5.0.303'
}

Copy after login

Then, you can enter the stage of accessing the voice service.

Let’s take the real-time speech recognition service as an example. After setting the authentication information of the application, the first step is to refer to the supported language list LANGUAGE to create an intent for setting real-time speech recognition parameters.

mSpeechRecognizer.getLanguages(new MLAsrRecognizer.LanguageCallback() { 
 @Override 
public void onResult(ListString> result) {
Log.i(TAG, "support languages==" + result.toString());
}
@Override
public void onError(int errorCode, String errorMsg) {
Log.e(TAG, "errorCode:" + errorCode + "errorMsg:" + errorMsg);
}
});

Copy after login

The second step is to create an activity, pass in the previously created intent for voice pickup, and return the result to the original activity, which can recognize speech within 60s (including 60s) in real time.

private static final int REQUEST_CODE_ASR = 100;
// REQUEST_CODE_ASR表示当前Activity和拾音界面Activity之间的请求码，通过该码可以在当前Activity中获取拾音界面的处理结果。
startActivityForResult(intent, REQUEST_CODE_ASR);

Copy after login

Finally, override the "onActivityResult" method to process the results returned by the speech recognition service (see the reference link for detailed code).

For the development details of each step, there is a detailed development guide available on the HMS Core official website, which is very novice-friendly.

In addition, HMS Core’s machine learning service is not only applicable to Huawei mobile phones, but also Android devices and iOS devices. The specific version requirements are as follows.

How does AI keep Dong Yuhui from getting off work?

How about it? By simply accessing the SDK, you can obtain AI algorithm capabilities at the level used by major manufacturers without complicated parameter adjustment and training. Are you already thinking big?

(And it’s not just speech language technology, ML Kit also provides various AI algorithm functions such as text and images. For specific details, please click "Read the original text" at the end of the article and refer to the ML Kit official website).

However, for the top anchors, no matter how dedicated they are to their posts, there will always be a time to go off the air.

In fact, this approach of releasing long-term accumulated technical capabilities to mobile application developers through easy-to-use tools is not unique to Huawei.

Whether it is Google’s GMS Core or Apple’s various Kits for developers, the core purpose is to continuously lower the threshold for the implementation of cutting-edge technologies, so that more developers can put more technology into practice without technical concerns. Put more energy and time into creativity.

As a result, mobile phone users are naturally happy to hear that the latest technologies can be directly experienced on their mobile phones in various fun and creative forms.

For manufacturers, the prosperity of applications constitutes the most important node in the ecological cycle, attracting more users externally and gathering more excellent developers internally.

The above is the detailed content of How does AI keep Dong Yuhui from getting off work?. For more information, please follow other related articles on the PHP Chinese website!