You may have noticed some impressive video memes made with AI in recent weeks. Harry Potter reimagined as a Balenciaga commercial and nightmarish footage of Will Smith eating spaghetti both recently went viral. They highlight how quickly AI’s ability to create video is advancing, as well as how problematic some uses of the technology may be.
These videos remind me of the moment AI image-making tools became widespread last year, when programs like Craiyon (formerly known as DALL-E Mini) let anyone conjure up recognizable, if crude and often surreal, images, such as surveillance footage of babies robbing a gas station, Darth Vadar courtroom sketches, and Elon Musk eating crayons.
Craiyon was an open source knockoff of the then carefully restricted DALL-E 2 image generator from OpenAI, the company behind ChatGPT. The tool was the first to show AI’s ability to take a text prompt and turn it into what looked like real photos and human-drawn illustrations. Since then, DALL-E has become open to everyone, and programs like Midjourney and Dream Studio have developed and honed similar tools, making it relatively trivial to craft complex and realistic images with a few taps on a keyboard.
As engineers have tweaked the algorithmic knobs and levers behind these image generators, added more training data, and paid for more GPU chips to run everything, these image-making tools have become incredibly good at faking reality. To take a few examples from a subreddit dedicated to strange AI images, check out Alex Jones at a gay pride parade or the Ark of the Covenant at a yard sale.
Widespread access to this technology, and its sophistication, forces us to rethink how we view online imagery, as was highlighted after AI-made images purporting to show Donald Trump’s arrest went viral last month. The incident led Midjourney to announce that it would no longer offer a free trial of its service—a fix that might deter some cheapskate bad actors but leaves the broader problem untouched.
As Startup’s Amanda Hoover writes this week, algorithms still struggle to generate convincing video from a prompt. Creating many individual frames is computationally expensive, and as today’s jittering and sputtering videos show, it is hard for algorithms to maintain enough coherence between them to produce a video that makes sense.
AI tools are, however, getting a lot more adept at editing videos. The Balenciaga meme, along with versions referencing Friends and Breaking Bad, were made by combining a few different AI tools, first to generate still images and then to add simple animation effects. But the end result is still impressive.
Runway ML, a startup that’s developing AI tools for professional image and video creation and editing, this week launched a new more efficient technique for applying stylistic changes to videos. I used it to create this dreamlike footage of my cat, Leona, walking through a “cloudscape” from an existing video in just a few minutes.
Different machine learning techniques open new possibilities. A company called Luma AI, for instance, is using a technique known as neural radiance fields to turn 2D photographs into detailed 3D scenes. Feed a few snapshots into the company’s app, and you’ll have a fully interactive 3D scene to play with.
These clips suggestt we are at an inflection point for AI video making. As with AI image generation, a growing rush of memes could be followed by significant improvements in the quality and controllability of AI videos that lodge the technology in all sorts of places. AI may well become a muse for some auteurs. Runway’s tools were used by the visual effects artists working on the Oscar-winning Everything Everywhere All At Once. Darren Aronofsky, director of The Whale, Black Swan, and Pi is also a fan of Runway.
But you only need to look at how advanced images from Midjourney and Dream Studio are now to sense where AI video is heading—and how difficult it may become to distinguish real clips from fake ones. Of course, people can already manipulate videos with existing technology, but it’s still relatively expensive and difficult to pull off.
The rapid advances in generative AI may prove dangerous in an era when social media has been weaponized and deepfakes are propagandists’ playthings. As Jason Parham wrote for Startup this week, we also need to seriously consider how generative AI can recapture and repurpose ugly stereotypes.
For now, the instinct to trust video clips is mostly reliable, but it might not be long before the footage we see is less solid and truthful than it once was.