A system for machine-generated speech that it says outperforms existing technology by 50 percent has been created by Google’s DeepMind unit, which is working to develop super-intelligent computers.
An artificial intelligence called WaveNet that can mimic human speech by learning how to form the individual sound waves a human voice creates, has been developed by U.K.-based DeepMind, which Google acquired for about 400 million pounds ($533 million) in 2014. The company announced this on Friday in a blog post.
Compared to any of Google’s existing text-to-speech programs, which are based on different technologies, human listeners found WaveNet-generated speech sounded more natural in blind tests for U.S. English and Mandarin Chinese. On recordings of actual human speech were still underperformed by WaveNet.
Many computer-generated speech programs combine speech fragments to form new words after using a large data set of short recordings of a single human speaker. If not completely natural, the result is intelligible and sounds human. The sound of the voice cannot be easily modified is however the drawback.
Usually based on rules about how the certain letter-combinations are pronounced, other systems form the voice completely electronically. DeepMind said that these systems have tended to sound less natural than computer-generated speech based on recordings of human speakers since they allow the sound of the voice to be manipulated easily.
A neural network that is designed to mimic how parts of the human brain function is how WaveNet is defined. Such networks need to be trained with large data sets.
Since the system requires too much computational power, WaveNet won’t have immediate commercial applications. DeepMind said that the system has to sample the audio signal it is being trained on 16,000 times per second or more. And then, based on each of the prior samples, it has to form a prediction about what the soundwave should look like for each of those samples. And perhaps that is why even the DeepMind researchers acknowledged that this “is a clearly challenging task,” in their blog post.
Despite the drawback DeepMind’s breakthrough would command close attention from tech companies. From mobile phones to cars, speech is becoming an increasingly important way humans interact with everything. Personal digital assistants that primarily interact with users through speech is an area that Amazon.com Inc., Apple Inc., Microsoft Inc. and Alphabet Inc.’s Google have all invested in.
20 percent of mobile searches using Google are made by voice, not written text, said Mark Bennett, the international director of Google Play, which sells Android apps, at an Android developer conference in London last week.
Computer’s ability to talk back in ways that seem fully human has lagged even while researchers have made great strides in getting computers to understand spoken language.
AlphaGo, an AI system that beat the world’s top ranked human player in the strategy game Go this year, was created by DeepMind and is very popular and WaveNet is yet another coup for DeepMind.
While Google has revealed that it has used DeepMind’s technology to reduce the power demands of its data centers by 40 percent, saving enough money to justify the amount Google spent to buy the London AI company, it has disclosed little about how DeepMind’s research has helped it commercially.
(Adapted from Bloomberg)