Meta AI announced Wednesday, Oct. 19, the official launch of its universal speech translator (UST) project. This intends to develop artificial intelligence (AI) systems that allow real-time voice-to-speech translation across all languages, including those that are spoken but not generally written.
According to VentureBeat, Meta claims that its model is the first AI-powered voice translation system for Hokkien, a Chinese language spoken in southeastern China and Taiwan, and those in the Chinese diaspora who do not speak Mandarin. The method paves the way for Hokkien speakers to conduct discussions with English speakers, which is a huge step toward unifying people all around the world and even in the metaverse.
In February, CEO Mark Zuckerberg presented the UST project at the company's AI Inside the Lab event. That event focused on metaverse creation using immersive AI.
Also Read : Researchers Develop an AI that Can Diagnose Disease Based on Speech - Here's How it Works
Developing the AI Speech-to-Speech Translation
Today's AI translation models are focused on commonly spoken written languages, and more than 40% of predominantly oral languages are not covered by such translation technologies, according to Meta.
Meta AI prioritized resolving three barriers in the translation system when developing UST. To deal with the lack of data, it expanded its training data set and added other languages while discovering novel uses for the data it already had. It solved the problems in models when they have to support an increasingly large number of languages. Finally, it looked for new approaches to analyze and enhance the quality of its outcomes.
The research team at Meta AI used Hokkien as a case study for an end-to-end solution, tackling everything from training data collection and modeling decisions to benchmark datasets. VentureBeat reported that the team prioritized the generation of human-annotated data, the automated mining data from huge unlabeled voice datasets, and the implementation of pseudo-labeling to generate weakly supervised data.
For the modeling, Meta AI applied recent advancements in using self-supervised discrete representations as targets for prediction in speech-to-speech translation. Additionally, they demonstrated the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, to train their models.
It has been announced that Meta AI will also make available a speech-to-speech translation benchmark set in order to aid further study in this area.
How UST Works
Input speech is transformed into a series of acoustic units using S2UT, a method pioneered by Meta and used in this model. The produced waveforms are identical to the ones used to create the input signals. As a bonus, Meta AI has implemented UnitY as part of a two-pass decoding method, with the first-pass decoder generating text in a similar language (Mandarin) and the second-pass decoder producing units.
Meta AI has created a system that converts Hokkien speech into a standardized phonetic notation known as "Tâi-lô" to facilitate the automated language assessment. Therefore, the data science team could immediately assess the translation quality of various methods by computing BLEU scores (a typical machine translation measure) at the syllable level.
The group not only devised a means of assessing Hokkien-English speech translations but also developed the first Hokkien-English bidirectional speech-to-speech translation benchmark dataset using a Hokkien speech corpus dubbed Taiwanese Across Taiwan.
Also Read : Mark Zuckerberg Confirms Launch of Meta Quest Pro-Can This $1500 VR Headset Beat Apple's Upcoming Wearable?
This article is owned by Tech Times
Written by Trisha Kae Andrada