OpenAI claims that it had a limited test of Voice Engine, its new voice cloning product, with a chosen few business partners. The results show prospective areas for the tech but safety concerns may prevent the public from having access to it.
According to OpenAI, Voice Engine can replicate a human’s voice from a one-time recording of 15 seconds duration. Next, the speech can be translated to be “natural-sounding speech which is very similar to that of the original speaker”
Humanized version: Once cloned, Voice Engine will be able to convert text inputs into audible speech with “emotive and realistic voices.” It opens a whole new range of applications but at the same time it raises an ethical problem too.
Promising use cases
A new tool called Voice Engine lets people copy a person’s voice. This lets machines speak in a voice that sounds just like the real person! Here are some ways companies are using Voice Engine:
- Learning: Age of Learning uses Voice Engine to help kids learn to read, add voices to lessons, and create characters that answer questions.
- Translation: HeyGen uses Voice Engine to translate videos into different languages. This lets people understand the videos even if they don’t speak that language!
- Helping Others: Dimagi trains health workers in remote areas. They use Voice Engine to give training in languages those workers understand.
- Communication: Livox creates devices for people who can’t talk. Voice Engine lets these people choose a voice that sounds natural, instead of a robot voice.
- Recovery: Lifespan is testing Voice Engine to help people who lost their ability to speak get their voice back.
Voice Engine is not the only tool that can copy voices, but it may be better than others.
OpenAI just launched Voice Engine,
— AshutoshShrivastava (@ai_for_success) March 29, 2024
It uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.
Reference and Generated audio is very close and hard to differentiate.
More details in 🧵 pic.twitter.com/tJRrCO2WZP
Safety concerns
OpenAI said it saw the potential of the use cases test participants came up with, but more safety measures will have to be in place before OpenAI decides “whether and how to deploy this technology at scale.”
OpenAI warned that such technology might be risky, and it is noteworthy in the election year. Fake Biden robocalls and the fake video of Kari Lake are some examples.
Furthermore, the participants in the trial had to have an “explicit and informed consent from the original speaker” and were not allowed to build a product that enabled people to produce their own voices.
OpenAI claims that they have implemented other safety measures too like an audio watermark. It did not specify exactly how but said it can provide “proactive monitoring” of Voice Engine usage.
Besides the big names in the AI industry, there are also others who are concerned about the type of technology getting into the world.
Voice AI is by far the most dangerous modality.
— Emad acc/acc (@EMostaque) March 29, 2024
Superhuman, persuasive voice is something we have minimal defences to.
Figuring out what to do about this should be one of our top priorities.
(We had sota models but didn’t release for this reason eg https://t.co/vjY99uCdTl) https://t.co/fKIZrVQCml
In closing
There’s a question about Voice Engine. Will everyone get to use it? It seems unlikely, and that might be for the best. Voice Engine could be misused in a big way.
For example, some security features, like voice recognition at banks, might not be safe anymore because of Voice Engine.
The company that makes Voice Engine, OpenAI, is aware of this problem. They are even recommending that banks stop using voice recognition for security.
Also, it’s hard to tell if something is real or fake these days, even with video. Voice Engine makes it even harder. So, in the future, it might be difficult to know if what you see and hear is real.