“Deepfake” videos and audio have the potential to destroy reputations, sway the elections, and maybe even start a World War. Deepfake (a portmanteau of “deep learning” and “fake”) is a technique for human image synthesis based on artificial intelligence. It is used to combine and superimpose existing images and videos onto source images or videos using a machine learning technique called a “generative adversarial network” (GAN). Take a look at a few examples below:
These types of fake videos and audio are becoming more and more realistic. Soon, a human’s eye and ear will not be able to determine what is real and what is fake.
With that said, according to The Verge, Deepfake propaganda is not currently a real problem.
It’s a good question why deepfakes haven’t taken off as a propaganda technique. Part of the issue is that they’re too easy to track. The existing deepfake architectures leave predictable artifacts on doctored video, which are easy for a machine learning algorithm to detect. Some detection algorithms are publicly available, and Facebook has been using its own proprietary system to filter for doctored video since September.
How will “deepfake” Impact the Podcast and Audio Industry?
Well, similar to video, it is becoming easier to detect insidious and fake audio clips. In fact, in Jan 2019 Google released a synthetic speech database for ‘deepfake’ audio detection research.
Over the last few years, there’s been an explosion of new research using neural networks to simulate a human voice. These models, including many developed at Google, can generate increasingly realistic, human-like speech.
While the progress is exciting, we’re keenly aware of the risks this technology can pose if used with the intent to cause harm. Malicious actors may synthesize speech to try to fool voice authentication systems, or they may create forged audio recordings to defame public figures. Perhaps equally concerning, public awareness of “deep fakes” (audio or video clips generated by deep learning models) can be exploited to manipulate trust in media: as it becomes harder to distinguish real from tampered content, bad actors can more credibly claim that authentic data is fake.
As it relates to the podcast industry in general, we may not see too much of this “deepfake” audio. If we do, it will have very little impact.
Theoretically, someone could use a famous person’s voice and feed it into a “deepfake” algorithm and have this synthesized voice pretend they are doing a podcast. But what is the point of that?
We believe a more logical and practical use case would be to leverage this technology to make the podcast host’s voice scalable and dynamic for their host read advertising. Leveraging this technology for this type of automated audio advertising could allow dynamically generated host read ads to be served in real-time based on listener’s location, preferences, demographics, etc. Now that could be a better use of this otherwise useless technology.