Meta has introduced Voicebox, a’state-of-the-art’ generative AI model that converts text to speech and includes audio editing and cross-language functionality.
In an Instagram Channels post shared by Mark Zuckerberg, CEO of Meta, a video demonstrated how Voicebox could read out text in a variety of vocal styles, eliminate distracting noise from audio recordings, learn and replicate speakers’ voices, and even generate output in multiple languages.
Additionally, the multilingual model can generate utterances in English, French, German, Spanish, Polish, and Portuguese. Other features listed included diverse text-to-speech, style transfer, content correction, text-to-speech in context, and noise elimination.
Meta’s Friday blog post detailed the model’s untrained abilities.
“This type of technology could be used in the future to help creators easily edit audio tracks, to allow visually impaired people to hear written messages from friends read aloud in their voices, and to enable people to speak any foreign language in their own voice,” Meta wrote on its blog.
It was hypothesized that the model could give virtual assistants and non-player characters in the metaverse more genuine voices.
Zuckerberg stated that Voicebox was still a “research project” but that Meta would continue to develop it.
A voice that sounded like the Meta chief said “more soon” in Polish as the video segment concluded.
Meta has been developing AI models to process a variety of media types and has made a number of these available for research purposes.