I would like to take a closer look at a challenge this is currently maturing and getting ready to dramatically interfere with our everyday life and that many people still have never heard of. It is about what I call the “falsification of everything in (digital) media”, i.e. disruption of the authenticity we generally grant (or used to grant) to the media.
No surprise, key ingredients in this development are rooted in the field of artificial intelligence and underlying technologies and concepts like deep learning and generative adversarial networks. But first, let’s briefly revisit the last 20-25 years of digitalization of our media consumption.
Digitalization has dramatically transformed, and oftentimes improved, the way we consume, interact with and regard media in everyday life
The media industry is perhaps the place that was disrupted in the most radical way, and for sure, it was the first industry that felt the impact on both user-experience (and expectations) and certainly their business model. Newspapers, magazines, TV, radio, local news, you name it – everything works differently today.
There are lots of benefits for users and consumers. We can access any information at our fingertips, first on our desktop and laptop computers, and since 10 years ago, mobile, wherever we are. Stories are presented visually rich, enhanced with multimedia elements, video snippets, and streams. The world of video/audiovisual has been transformed completely with new platforms, professional production methods for virtually anyone and asynchronous consumption whenever we want. We have new formats like vlogs and podcasts. We can take a library of a million books around with us, and we can listen to books. We have a larger variety of news sources and many of them are ‘free’ (but you pay with data in advertising business models).
And we get loads of recommendations from our ‘friends’ (very intentional quotation). Moreover, not only can all of us very easily recommend news, articles, and media to other people. Everybody of us has access to platforms that potentially allow us to reach an audience of millions with our own content (and that is where it gets tricky, but more on this later).
However, with the new actors (search engines, social platforms, online advertising platforms & aggregators) the business model of the publishing industry has taken a huge hit. Search engines (i.e. Google), social platforms (i.e. Facebook) and online classifieds providers (e.g. Craigslist) eat journalists lunch and have led to very questionable business practices in the industry (strictly performance driven publishing, click-bait headlines, dramatic quality decrease, intrusive tracking & advertising formats, to name just a few). But this is not what I am looking into today, that’s a whole important deep-dive for itself (and a challenge that we need to solve, too).
What I am after is about credibility and authenticity that we relate to media. Shady practices by desperate players who are under pressure from a business model point of view are one ingredient. But what is more threatening is a combination of two trends:
- Everyone can be a publisher of news/updates of any kind today
- Technology is progressing at an ever-increasing speed – allowing everyone to produce and to alter media in (scary) professional ways
At a time when different political camps can hardly agree on facts, the persuasiveness of images and video recordings so far seemed to bring welcome clarity. Audio and video recordings allow people to become first-hand witnesses of an event so that they no longer have to decide whether to trust another, possibly politically opposed, authority. With smartphones that easily capture audio and video content, and social media platforms that share and use that content, people today can rely on an unparalleled level of their own eyes and ears. But that’s a big danger.
We have only just started to realize how unprepared we apparently are when it comes to peoples’ perception and quality/authenticity judgments in digital media
In our non-digital past, the mechanics of the media industry were predominantly such that magazines, newspapers and television stations managed to control the flow of information to the public. Journalists set strict professional standards to control the quality of news, and due to the relatively small number of media companies, only a limited number of individuals and organizations were able to disseminate information widely.
Over the last ten to twelve years, however, as social media platforms such as Facebook and Twitter have become more widespread, more and more people have learned a model in which they consume information that depends on a large number of users to generate relatively unfiltered content. Users tend to curate their experiences so that they mostly encounter perspectives with which they already agree (a tendency reinforced by platform algorithms) and turn their social media feeds into echo chambers.
Since at least 2016, we are familiar with the issue of ‘fake news’. The internet allows everyone to publish. It allows everyone to create professional looking websites and news offers. Social networks allow for lightning fast viral distribution of any content.
The internet and social networks have put demographics in front of those “news sources” who have previously not consumed serious media and news formats at all. With all the lack of experience and ability for the informed judgment of the authenticity and trustworthiness of news sources. Nowadays the majority of people (at least this is what polls and surveys say) trust their personal network more than traditional media when it comes to the news.
And because of this, we as a society are much more vulnerable than in the old structures when it comes to misinformation, false information, and propaganda. We are not short of examples where very obviously false information spread virally, e.g. ‘Incoming national security adviser’s son spreads fake news about D.C. pizza shop’ or ‘Duped by fake news story, Pakistani minister threatens nuclear war with Israel’.
Those are examples of professional, i.e. (foreign) government actors or organizations plotting fake news. This is happening and it has been happening for a long time.
However, before the age of social media, it was so much more difficult to run such misinformation campaigns. Social media represents the advent of heaven for malicious actors with respective intentions. There is a great feature of the New York Times that I highly recommend on the role of Russian intelligence in such activities: ‘Operation Infektion’.
We are now at a point where powerful manipulation technology for misinformation and spreading fake news arrives at the fingertips of basically anyone
The next stage of authenticity disruption is already very visible and many aspects of it have already left the prototype stage. Artificial intelligence techniques make digital tools come to life which also enables amateurs to create attractive and professional looking altered content, may it be audio, graphics or even video. To do things that have taken professional designers lots of manual work in the past. Unprecedented quality of artificially generated/altered content is now becoming available within a few clicks.
An active community of developers is growing around open source projects, building free and easy to use software tools for creating Deepfakes.
Want to get an idea of what we are talking about?
“Text Manipulation”Has this been written by humans?
We mostly think of “multimedia” these days when it comes to manipulation vulnerability. But plain old text is being tackled by AI big time. In fact, it is a whole different beast. Creating meaningful sentences, getting the semantics from the conjunction of words, is very challenging for machines.
OpenAI, the private research group focusing on ethics in AI, has just demonstrated their GPT2 tool. It is an AI text generator, which the group right away reckoned was too dangerous to be released publicly. It was trained with enormous amounts of computing power and data. Some of the text generated by GPT2 seems quite realistic. The following was created by it:
“So this clearly creates many more risks. The Web is already awash with low-quality human-generated spam and misinformation, but this sort of thing could dramatically lower the cost. And it could play potently into the confirmation bias of the conspiracy-prone, like anti-vaxxers.”
This tool is producing remarkably human-sounding and coherent prose that opens the prospect of fake news circulated at industrial scale. There goes away our ability to rapidly discard an article based on bad spelling, bad grammar, lack of structure and lack of coherence. Output of tools like GPT2 is much harder to filter. Use cases for this have no limits. It could be used for anything ranging from made-up Amazon reviews up to dystopian ends, like huge, coordinated onslaughts of racist invective, fake news stories.
It is fair to assume that these technologies will diffuse rapidly. As a parallel, think of Leela, an open-source clone as an answer to Deepmind’s Alpha Go, that only took a few months after Alpha Go’s stunning successes to be released to the public.
“Voice Manipulation”Can you still trust recorded voice?
Here’s an example of Adobe’s ‘VoCo’, which can be described as a ‘Photoshop for Audio’:
VoCo allows users to feed about ten to twenty minutes of someone’s voice into the application and then type words that are expressed in that exact voice. The resulting voice, which is comprised of the person’s phonemes, or the distinct units of sound that distinguish one word from another in each language, doesn’t sound even remotely computer-generated or made up.
Another one: already in February 2018, Baidu showed a program that can clone voices after analyzing even a seconds-long clip, using a neural network. So they claim they can clone any voice within just seconds of hearing it.
You can try it for yourself. Canadian AI startup Lyrebird lets you synthesize your own voice so that you can have anything spoken by their service as if it was you:
Picture & Image ManipulationCan you still trust your eyes?
This is about creating stunning alterations and changes to existing images. But we are also talking about images that are being created completely artificially. We see seemingly real things that have fully been created by AI algorithms.
Creating artificial faces: these people, cats, and cars do not exist in reality. They were generated by software developed at chipmaker Nvidia, whose graphics chips have become crucial to machine learning projects:
Here are more images generated by algorithms created by researchers at DeepMind, Alphabet’s UK-based AI powerhouse. Their software BigGAN was trained on a giant database of 14 million varied images scraped from the internet, spanning thousands of categories, in an effort that required hundreds of Google’s specialized TPU machine learning processors. That broad experience of the visual world means the software can synthesize many different kinds of highly realistic looking images. Scraping off the internet can be done by pretty much anyone. Renting cloud-based TPU processing power is becoming more and more affordable as well:
Face manipulation: showing up in almost any social platform messaging app these days. Interesting gem: Samsung was one of the first companies that has embedded functionality in their camera app to automatically doctor your selfies. Which is even not always welcome to anyone:
Do you want that body and face of yours or someone elses placed into any other setting? Remove.bg is a free of charge cloud-based service that lets you have an easy start into it. Their AI algorithms remove any background and deliver perfectly isolated people with a transparent background to insert anywhere:
The supreme discipline: Video ManipulationCan you still trust your senses?
Things are starting to get serious since the so-called DeepFakes are around and growing more and more adoption. DeepFakes, in particular, are the product of recent advances deep learning, in which sets of algorithms called neural networks learn to infer rules and replicate patterns by sifting through large data sets.
Here’s a particularly entertaining snippet from ‘Jimmy Kimmel Live!’:
DeepFakes technology has opened the door to techniques and formats called ‘re-enactment’, where various voice and video input sources are mixed for stunning results. Here’s some examples includes manipulation of clips of Vladimir Putin and Donald Trump, altering their facial expressions in real time:
More stunning work including footage of Barack Obama:
This here is at least funnier. Here’s Ted Cruz, AI stitched onto The Tonight Show’s Paul Rudd:
This one here is particularly freaky..
Thanks to the rise of this technology, highly realistic and difficult-to-detect digital manipulations of audio or video, it is becoming easier than ever to portray someone saying or doing something, he or she never said or did.
This is and can be catastrophic in the public space, politics, business, and entertainment. But it is creating lots of headache and pain in the private space already today. There is a vivid open-source community making fast progress with hyper-realistic fake porn videos using neural nets. Users are employing deepfake technology to insert people’s faces into pornography without their consent or knowledge, and the growing ease of making fake audio and video content will create ample opportunities for blackmail, intimidation, and sabotage. Commercial and even free deepfake services have already appeared in the open market, and versions with alarmingly few safeguards are likely to emerge on the black market.
With such material possible to be created using tools available to almost anyone with a laptop and access to the Internet, a lot of concern and headache is coming to us.
What does this all mean for us?
400 years after the appearance of the first ‘mass’ media publishing activity and less than 100 years after the introduction of television and only 25 years after the begin of mass internet adoption we have reached a point where the basis of trust into the authenticity of mass media is at risk to get completely shattered at this stage. Using technology for misinformation and propaganda is nothing new. More recently, technologies such as Photoshop have made doctoring images as easy as forging text. What makes Deepfakes unprecedented is their combination of quality, applicability to persuasive formats such as audio and video, and resistance to detection.
Imagine a world where you cannot trust any digitally, non-directly transmitted picture, sound clip or video because it can be falsified without much effort. What if you cannot trust any moving picture anymore? What if you cannot trust any spoken word digitally transmitted any more? Today people are falling for fake, falsified or simply made up information in text form on the internet. When you see a video with a famous person speaking with his or her “own” voice, today we would feel that this is authentic, wouldn’t we? It would be a future where with a total loss of trust in ANY media because any media can be faked. A future where the current level of ‘fake news’ seems like child’s play. A future where we need to redefine the way we interact with each other, consume news and build trust (personally, in business contexts and in a political context).
What we saw in the 2016 election is nothing compared to what we need to prepare for in the future. There could not only be thousands of fake-news articles floating around the Internet, but also countless fake videos and fake audio clips, too. It is not difficult to come up with plots for deepfake videos that could have horrific consequences when consumed by unprepared users. Deepfakes can be tailor-made to drive societies apart and pour gasoline on just about any culture war fire.
2019 will be the year that a malicious ‘deepfake’ video sparks a geopolitical incident. We predict that within the next 12 months, the world will see the release of a highly authentic looking malicious fake video which could cause substantial damage to diplomatic relations between countries.Katja Bego, Senior Researcher at NESTA
And in relations between countries and cultures, it is not less concerning. Imagine some political group utilizing that technology to create a fake hidden video clip of President Trump telling Rex Tillerson that he plans to drop a nuclear bomb on China. Imagine a video depicting the Israeli prime minister in private conversation with a colleague, seemingly revealing a plan to carry out a series of political assassinations in Tehran. Or an audio clip of Iranian officials planning a covert operation to kill Sunni leaders in a particular province of Iraq. Or a video showing an American general in Afghanistan burning a Koran. There is the reality that other governments can weaponize fake news as an act of digital terror. Remarkably, mass population manipulation — in particular political control — arising from placing AI algorithms in charge of our information diet does not even necessarily require very advanced AI.
I spoke recently with one of the most senior U.S. intelligence officials, who told me that many leaders in his community think we’re on the verge of a deepfakes “perfect storm.” […] First, this new technology is staggering in its disruptive potential yet relatively simple and cheap to produce. Second, our enemies are eager to undermine us. […] China will eventually be incredibly good at this, and we are not ready.Ben Sasse, US Senator [Washington Post]
Deepfakes and perfectly manipulated media may also erode democracy in other, less direct ways. The problem is not just that Deepfakes can be used to stoke social and ideological divisions. They can create a “liar’s dividend”: as people become more aware of the existence of Deepfakes, public figures caught in genuine recordings of misbehavior will find it easier to cast doubt on the evidence against them.
And now – what can we do about it? Is everything lost?
Considering what we have learned so far, it is obvious that we need a whole new approach to how digital media can be trusted in the future. We need to come up with vastly different concepts on how we can ensure the authenticity of our communication and personal messages and speech in a radically digital world. We will have to find a completely new way of how things we see, hear, feel can be trusted… or at least distinguished.
Here are a couple of ideas and possible solutions that are currently being discussed or developed. We need profound consideration of these and intensify work on them.
Education, education, education. Let’s face it. Whatever we do and whatever kind of countermeasures we will develop, it is crucial to make people aware that these technologies exist and that they are being applied. This time it is about accepting the challenge and being vocal about it in the highest ranks of government, leadership, and society. If we can learn one thing from tumbling into the current crisis of trust of the media, it is that we need better education. That we need to ramp up efforts to increase cultural skills in maneuvering through digital media. People need to get a better feel for whom to trust and what to watch out for.
Digital forensics centered solutions. Leverage technology itself to counteract, identify and suppress digital fakery. This could happen directly at the time when e.g. image or video footage is being initially created. Or measures that aim at detecting tinkering at any point in the life-cycle of a digital media item.
- Verification of image integrity (the moment it is taken):
- This e.g. involves performing checks to make sure the photographers aren’t trying to spoof the device’s location data and time stamp (e.g. do the camera’s coordinates, time zone, altitude, and nearby Wi-Fi networks all corroborate each other?) For each item a digital fingerprint would be stored by computing mathematical values from each image.
- The bet is that authenticating videos by tracing them back to their source is a better solution than trying to sniff out forgeries after they have already been made.
- Challenge: scaling this (billions of photos each day) and integrating such checks & filters into platform upload processes or dedicated upload processes to fingerprint authentication services.
- Identifying modifications in media (in the aftermath).
- This is about digital forensics techniques that pick out whether any pixels or metadata seem altered. They can look for shadows or reflections that don’t follow the laws of physics. Does the light in the image refract as it would for a three-dimensional scene? Or is someone taking a picture of another two-dimensional photo? They would check how many times an image file has been compressed to determine whether it has been saved multiple times.
- Adobe itself is working on solutions that can be used to automatically spot edited pictures. The idea is that digital forensics done by humans can be automated by machines in much less time. However, this is still an early-stage research project and not yet available as a commercial product.
- There are training methods for creating Deepfakes which involve feeding it images, not video. This way, particular human physiological quirks like breathing, blinking, etc. do not show up in respective computer-generated videos. As a way of counteracting, AI systems have been developed that use computer vision to detect such blinking issues in fake videos.
- Downside: it is an arms race. Such measures serve only until the next wave of innovation. It is going to be a cat-and-mouse game. Even if extremely capable detection algorithms emerge, the speed with which Deepfakes can circulate on social media will always make debunking them an uphill battle. And by the time a forensic alarm bell rings, the damage may already be done.
Authentication & trust systems. That is a whole new type of such approach. It is about developing digital identities, identification and authentication mechanisms that allow for unique identification & authentication of creators and distributors of digital information. So that it is possible to match content to the creator and allow for spotting of manipulation.
- Blockchain-based trust systems to leverage benefit from its unique foundation of digital trust.
- Blockchain solutions have the potential to emerge as effective fraud fighters in this context. This could power an automation of trust so that at some point, we would actually not trust at all a document that is being presented to you unless it’s anchored in a blockchain somewhere. Every piece of content could effectively be stamped with a record of authenticity that could be used later as a reference to compare to suspected fakes.
- Forensic solutions like outlined above, who are authenticating content before it ever spreads by digitally watermarking audio, photo, and video content at the moment of its creation, are using metadata that can be logged immutably on a distributed ledger based on blockchain technology.
- Real-world products for this are being developed already today. Amber Video, a San Francisco startup, creates a breadcrumb trail that begins the moment a video is recorded. It uploads a unique fingerprint corresponding to each video and saves it on a blockchain, so that viewers can later check to make sure it hasn’t been tampered with. Another startup in this field is Truepic.
- Challenge: such solutions would not only need to be ubiquitously deployed in the vast array of devices that capture content, including smartphones, laptops, cameras, etc. Even bigger challenge: utilizing such technology would need to be made a precondition for uploading content to the most popular digital platforms, such as Facebook, Instagram, Twitter, YouTube and many more. From today’s perspective that does not seem very likely. In the absence of legal or regulatory obligation, they for sure will question if it is affordable and in real demand, and they would fear to risk losing market share to less rigorous competitors when they start blocking people from uploading unauthenticated content. Device makers are likely to object on concerns of interfering with the performance of their products.
- A radical or unusual approach (in terms of today’s standards and habits) would be so called authenticated alibi services.
- This is a more speculative and controversial technological approach that aims at Deepfakes threats to high-profile individuals, such as politicians and celebrities, with valuable but fragile reputations. To protect themselves against digital fakery and misinformation, they might opt to engage in enhanced forms of “lifelogging”, i.e. the practice of recording nearly every aspect of one’s life. All for the sake of proving where they were and what they were saying or doing at any given time. This could go as far as having downright partnerships with major news and social media platforms, which would enable rapid confirmation or debunking of content.
- Downside: such logging would be deeply invasive, and many people would object against such practices it. But in addition to high-profile individuals and celebrities, some employers might begin considering such services for certain categories of important and specially exposed employees. And a use case much in focus of this is police officers who are increasingly required to use body cameras. And it was already demonstrated that that some bodycams can be hacked and their video altered.
- Wider drastic consequence: even if only a relatively small number of people engage in intensive lifelogging, vast repositories of data would be produced in which the rest of us would find ourselves inadvertently caught. It would basically be a massive peer-to-peer surveillance network for constantly recording our activities.
Legal measures: last not least we certainly need to review and adapt legal frameworks so that they go hand in hand with possible technical countermeasures. When there are ways to verify the authenticity of a piece of content, they are toothless tigers unless there is legal pressure to utilize them.
- Target individual creators: this is about introducing new or adapting existing laws to criminalize the malicious creation and distribution of fake and manipulated content in order to punish people who knowingly alter digital text or videos, photos, and audio of others — including Deepfakes, without their consent.
- Target distributors, like e.g. Facebook or Youtube, in cases where they knowingly distribute manipulated media. That means these platforms would need to set up reporting systems, like e.g. the ones used to suppress pirated movies, and take down fake content when they are notified of them.
- Downside: such legal measures place risk putting over-broad liability on distributors. It could scare platforms into immediately taking down everything that’s reported as manipulated. Potentially deleting legitimate content in the process and leading to excessive censorship.
Our democracies need to be prepared
In the meantime, democratic societies will have to learn resilience. It is no use to try to hide from this or try to turn back. It is going to happen, so let’s be open about the risks and challenges of the future.
In 2018 Dutch company DeepTrace, whose mission is to ’empower people in understanding and trusting what they see’, identified only 25 new research papers focused on detecting fake imagery, while finding 902 papers on research that helps pushing Deepfake technology further.
We will have to accept that audio and video content cannot be taken at face value any more. But we will have to fight the total descent into a post-truth world, in which citizens retreat to their private information bubbles and regard as fact only that which flatters their own beliefs.
In short, democracies will have to accept an uncomfortable truth: in order to survive the threat of Deepfakes, they are going to have to learn how to live with lies.