Microsoft Achieves Human Parity In Conversational Speech Recognition Thanks To New System

Microsoft Artificial Intelligence and Research engineers have developed a speech recognition system that has transcribing capabilities on par with human translators.

Setting a record low, the technology is reported to have a word rate error of 5.9 percent, marking this as the first time that a rate went below 6 percent.

According to the researchers, that outcome is similar to what the people who transcribed the same conversation as the software achieved.

"We've reached human parity. This is an historic achievement," Xuedong Huang, lead speech scientist of Microsoft, says.

The company says that the researchers used neural language models that are capable of not only learning the sound of words but also their links to others. For instance, the software can understand that the words "fast" and "quick" have similar meanings.

As for what this entails, Microsoft is planning on using this technology to improve its virtual assistant offering Cortana and certain accessibility tools such as speech-to-text transcription software.

On an interesting note, Tom Brant of PC Magazine reports that the best speech recognition systems five years ago typically yielded transcriptions with word error rates of between 20 and 25 percent. Needless to say, this development is a huge step-up from what the technology could do back then.

"Even five years ago, I wouldn't have thought we could have achieved this. I just wouldn't have thought it would be possible," Harry Shum, executive vice president of the Microsoft Artificial Intelligence and Research group, says.

However, the company emphasizes that this isn't an indication that the system can transcribe and recognize speech perfectly, adding that even humans don't do that. At any rate, it's still a big deal in the research of neural network.

It's also worth mentioning that the Redmond company managed to hit a word rate error of 6.3 percent back in September. At the time, it was without a doubt an impressive feat.

For those who are interested, the research paper is available online, crediting Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu and Geoffrey Zweig.

Put simply, this is more or less just the beginning when it comes to speech recognition technology, as Microsoft still has a long way to go to develop software that can clearly hear various types of voices set in many different environments. Nevertheless, this achievement is a significant step forward.

What do you think of what Microsoft has reached? Feel free to drop by our comments section below and let us know.

ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Tags:Microsoft
Join the Discussion
Real Time Analytics