The company Baidu, which is often called “Chinese Google”, published a technical document that describes the latest developments in the field of artificial intelligence (AI). The system, which operates on the basis of a neural network, is capable of cloning a human voice based on the analysis of even a very short fragment of the original material. The program not only imitates human speech very well, but it is also capable of introducing its own features like an accent.
Examples of simulating human voices by a neural network can be found by going through by this link.
Previous versions of this technology allowed to create an imitation of human speech on the basis of analysis of longer samples. In 2017, a team of engineers from Baidu Deep Voice introduced a technology capable of simulating human speech based on a 30-minute source material. Competitive developments in this area, in turn, cope somewhat faster. For example, VoCo from Adobe could simulate human speech on the basis of a 20-minute demo. And the development of Lyrebird by the Canadian startup demonstrated an even more amazing opportunity – it only took a minute of the original sample of the human voice to create its imitation. The new development of Baidu went even further – it only takes a few seconds of the original material.
At first glance, it may seem that there is no practical benefit to such technologies – for example, one pampering. But this is a big mistake. In the future, these technologies will definitely find their sphere of use. Imagine a person who has lost the opportunity to speak and again to find it, even if through a machine. Or a troubled child who does not want to go to bed until he hears your voice reading a fairy tale, while you are very far away and you simply do not have the physical ability to call him. This is only the smallest part of the opportunities that can open up to this technology.
In addition, this technology can be used, for example, to create personalized digital assistants, able to talk with you in a real human, not computer voice.
But, like any other technology, here, too, has its own reverse side of the coin. We can face its abuse and use for not entirely legitimate purposes. The New Scientist portal reports that the current version of the program was able to create one type of voice that, when tested by the voice recognition system, could fool it in 95 percent of cases. And people in general appreciated the quality of cloned samples to 3.16 points out of 4. Such rates sooner or later we can face cases of fraud using artificial intelligence, journalists say.
In the world there already exist programs that, with the help of neural networks, are capable of modifying or even imitating human faces on video. For example, now the Internet has been swept by a wave of pornographers, in which faces of models are replaced by faces of celebrities. Of course, all this looks so far only an ordinary prank, but soon, in combination with technology that can accurately simulate a particular voice, we can face another wave of “fake news” in which prominent figures from different spheres and policies will say those things that they would never actually say.
Many people can already be deceived right now, using seemingly ordinary programs like Photoshop. Imagine what kind of problems we may encounter when the artificial intelligence that has fallen into the wrong hands, billions of times greater than the possibilities of the same photoshop, will take over.