Image generators like Stable Diffusion can create what look like real photographs or hand-crafted illustrations depicting just about anything a person can imagine. This is possible thanks to algorithms that learn to associate the properties of a vast collection of images taken from the web and image databases with their associated text labels. Algorithms learn to render new images to match a text prompt in a process that involves adding and removing random noise to an image.
Because tools like Stable Diffusion use images scraped from the web, their training data often includes pornographic images, making the software capable of generating new sexually explicit pictures. Another concern is that such tools could be used to create images that appear to show a real person doing something compromising—something that might spread misinformation.
The quality of AI-generated imagery has soared in the past year and a half, starting with the January 2021 announcement of a system called DALL-E by AI research company OpenAI. It popularized the model of generating images from text prompts, and was followed in April 2022 by a more powerful successor, DALL-E 2, now available as a commercial service.
From the outset, OpenAI has restricted who can access its image generators, providing access only via a prompt that filters what can be requested. The same is true of a competing service called Midjourney, released in July of this year, that helped popularize AI-made art by being widely accessible.
Stable Diffusion is not the first open source AI art generator. Not long after the original DALL-E was released, a developer built a clone called DALL-E Mini that was made available to anyone, and quickly became a meme-making phenomenon. DALL-E Mini, later rebranded as Craiyon, still includes guardrails similar to those in the official versions of DALL-E. Clement Delangue, CEO of HuggingFace, a company that hosts many open source AI projects, including Stable Diffusion and Craiyon, says it would be problematic for the technology to be controlled by only a few large companies.
“If you look at the long-term development of the technology, making it more open, more collaborative, and more inclusive, is actually better from a safety perspective,” he says. Closed technology is more difficult for outside experts and the public to understand, he says, and it is better if outsiders can assess models for problems such as race, gender, or age biases; in addition, others cannot build on top of closed technology. On balance, he says, the benefits of open sourcing the technology outweigh the risks.
Delangue points out that social media companies could use Stable Diffusion to build their own tools for spotting AI-generated images used to spread disinformation. He says that developers have also contributed a system for adding invisible watermarks to images made using Stable Diffusion so they are easier to trace, and built a tool for finding particular images in the model’s training data so that problematic ones can be removed.
After taking an interest in Unstable Diffusion, Simpson-Edin became a moderator on the Unstable Diffusion Discord. The server forbids people from posting certain kinds of content, including images that could be interpreted as underage pornography. “We can’t moderate what people do on their own machines but we’re extremely strict with what’s posted,” she says. In the near term, containing the disruptive effects of AI art-making may depend more on humans than machines.