The tech world welcomed last year’s Google announcement that it was developing prototype devices to provide real-time, instantaneous voice translation to its Android mobile platform users. But Google’s vice-president of Android product management Hugo Barra conceded that the company was probably a few years away from any kind of public release. Well, Microsoft has just announced that it will release a real-time translation tool for the Skype platform by the end of this year. Is this just the latest play in the real-time translation war or Microsoft’s checkmate?
VOIP Reaches for the Sky
Skype is a freemium voice-over-IP (VOIP) application that allows its users to communicate via voice, text, and video with both other Skype users and via landline. Originally placed on the peer-to-peer market by a team of European developers in 2003, Skype was bought by several companies between 2005 and 2009 before it was acquired by Microsoft in 2011 for $8.5 billion. At the time of its acquisition, the application had 124 million active monthly users (Wired) and represented 13 percent of traffic on the international call market (Wikipedia). Today, Skype, with headquarters in London, has 300 million active users and represents approximately one-third of the international call market (Mashable).
On Tuesday at Re/code’s inaugural Code Conference (#CodeCon) at Rancho Palos Verdes, Microsoft CEO Satya Nadella spoke with Re/code editors Kara Swisher and Walt Mossberg on, among other things, how Microsoft is ready to bring real-time voice translation to Skype users worldwide. He recognized that the release is no miracle birth — rather the result of many years of advancements and collaboration in speech recognition, machine translation, and speech synthesis.
Real-Time Talk and Real-Talk Problems
The promise of real-time voice translation has fueled our fantasies for generations, making it into our fantasy books as Babelfish and our television and film series as Universal Translators. The reality is far from smooth.
As the video inserted shows, the product is “gist” achieving rather than precise. In the video, starting at 1:30, Gurdeep Sing Pall, Microsoft Vice President for Skype and Lync, demonstrated Skype Translate in a test conversation with his German colleague Diane Heinrichs, Microsoft Communications Manager for Social Enterprise & Market Strategy. Her focus on simple, short, and clearly pronounced sentences yield better results than his longer and more casually delivered replies. Native speakers who are fluent in English and German will recognize the challenge.
Pall: Hello, Diane. How are you doing? — Hallo, Diane. Wie dir’s geht?
Heinrichs: Hallo, mein Freund? — Hello, my friend.
Heinrichs: Ist es wahr, dass du bald nach London umziehst? — Is it true that you were (sic) soon moving to London?
Pall: Yes, I’m currently planning to move there sometime between the middle of June to [sic] the, uh, beginning of July if everything goes per track. — Ja. Ich plane derzeit dort irgendwann zwischen Mitte Juni bis Anfang Juli zu verschieben, wenn alles geht, aber verfolgen.
Competition or Cooperation?
Microsoft is not the only major brand looking to capitalize on developments in speech-to-speech technology. Last July, Facebook’s acquisition of Mobile Technologies, the developers behind the Jibbigo speech translation app, fueled speculation that the company saw real, long-term shareholder value in translation technology.
Jibbigo, of course, also benefited from a previous investment by the U.S. military, the result of a collaboration with Carnegie Mellon University that fed some some 40,000 Iraqi Arabic words to generate voice-to-voice translation into English.
Thereafter, in July, AT&T announced that it was pilot testing its own real-time, cloud-computing-backed speech translation system saying that the company’s goal was to make voice-to-voice translation so convincing “that it will sound like you.”
Will all this competition bring us closer to the real-time translation of our dreams. Not everyone thinks so. Just earlier this month, Slate wrote that the lack of cooperation was a defining challenge for the speech recognition technologies, undermining its potential. “The people working on this can’t even decide on an acronym,” wrote Slate.
Microsoft believes we should hold out hope that Skype can help us achieve the hitherto impossible.
“It’s been a dream of humanity ever since we started to speak and we wanted to cross the language boundary,” said Nadella.
A worthy endeavor, indeed.