Your boss contacts you over Skype. You see her face and hear her voice, asking you to transfer a considerable amount of money to a firm you’ve never ever heard of. Would you ask for written confirmation of her orders? Or would you simply follow through on her instructions?
I would certainly be taken aback by such a request, but then again, this is not anywhere near a normal transaction for me and my boss. But, given the success rate of CEO fraud (which was a lot less convincing), threat actors would need only find the right person to contact to be able to successfully fool employees into sending the money.
Imagine the success rate of CEO fraud where the scam artists would be able to actually replicate your boss’ face and voice in such a Skype call. Using Deepfake techniques, they may reach that level in a not too distant future.
What is Deepfake?
The word “Deepfake” was creating by mashing “deep learning” and “fake” together. It is a method of creating human images based on artificial intelligence (AI). Simply put, creators feed a computer data consisting of a lot of facial expressions of a person and find someone who can imitate that person’s voice. The AI algorithm is then able to match the mouth and face to synchronize with the spoken words. All this would result in a near perfect “lip sync” with the matching face and voice.
Compared against the old Photoshop techniques to create fake evidence, this would qualify as “videoshop 3.0.”
Where did it come from?
The first commotion about this technique arose when a Reddit user by the handle DeepFakes posted explicit videos of celebrities that looked realistic. He generated these videos by replacing the original pornographic actors’ faces with those of the celebrities. By using deep learning, these “face swaps” were near to impossible to detect.
DeepFakes posted the code he used to create these videos on GitHub and soon enough, a lot of people were learning how to create their own videos, finding new use cases as they went along. Forums about Deepfakes were immensely popular, which was immediately capitalized upon by coinminers. And at some point, a user-friendly version of Deepfake technology was bundled with a cryptominer.
The technology
Deepfake effects are achieved by using a deep learning technology called autoencoder. Input is compressed, or encoded, into a small representation. These can be used to reproduce the original input so they match previous images in the same context (here, it’s video). Creators need enough relevant data to achieve this, though. To create a Deepfake image, the producer reproduces face B while using face A as input. So, while the owner of face A is talking on the caller side of the Skype call, the receiver sees face B making the movements. The receiver will observe the call as if B were the one doing the talking.
The more pictures of the targeted person we can feed the algorithm, the more realistic the facial expressions of the imitation can become.
Given that an AI already exists which can be trained to mimic a voice after listening to it for about a minute, it doesn’t look as if it will take long before the voice impersonator can be replaced with another routine that repeats the caller’s sentences in a reasonable imitation of the voice that the receiver associates with the face on the screen.
Abuse cases
As mentioned earlier, the technology was first used to replace actors in pornographic movies with celebrities. We have also seen some examples of how this technology could be used to create “deep fake news.”
So, how long will it take scammers to get the hang of this to create elaborate hoaxes, fake promotional material, and conduct realistic fraud?
Hoaxes and other fake news are damaging enough as they are in the current state of affairs. By nature, people are inclined to believe what they see. If they can see it “on video” with their own eyes, why would they doubt it?
You may find the story about the “War of the Worlds” broadcast and the ensuing panic funny, but I’m pretty sure the more than a million people that were struck with panic would not agree with you. And that was just a radio broadcast. Imagine something similar with “live footage” and using the faces and voices of your favorite news anchors (or, better said, convincing imitations thereof). Imagine if threat actors could spoof a terrorist attack or mass shooting. There are many more nefarious possibilities.
Countermeasures
The Defense Advanced Research Project Agency (DARPA) is aware of the dangers that Deepfakes can pose.
“While many manipulations are benign, performed for fun or for artistic value, others are for adversarial purposes, such as propaganda or misinformation campaigns.This manipulation of visual media is enabled by the wide-scale availability of sophisticated image and video editing applications, as well as automated manipulation algorithms that permit editing in ways that are very difficult to detect either visually or with current image analysis and visual media forensics tools. The forensic tools used today lack robustness and scalability, and address only some aspects of media authentication; an end-to-end platform to perform a complete and automated forensic analysis does not exist.”
DARPA has launched the MediFor program to stimulate researchers to develop technology that can detect manipulations and even provide information about how the manipulations were done.
One of the signs that researchers now look for when trying to uncover a doctored video is how often the person in the video blinks his eyes. Where a normal person would blink every few seconds, a Deepfake imitation might not do it at all, or not often enough to be convincing. One of the reasons for this effect is that pictures of people with their eyes closed don’t get published that much, so they would have to use actual video footage as input to get the blinking frequency right.
As technology advances, we will undoubtedly see improvements on both the imitating and the defensive sides. What already seems to be evident is that it will take more than the trained eye to recognize Deepfake videos—we’ll need machine learning algorithms to adapt.
Anti-video fraud
With the exceptional speed of developments in the Deepfakes field, it seems likely that you will see a hoax or scam using this method in the near future. Maybe we will even start using specialized anti-video fraud software at some point, in the same way as we have become accustomed to the use of anti-spam and anti-malware protection.
Stay safe and be vigilant!