[Jay Alammar] presents a pictorial guide to how stable diffusion works, and its principles are perfectly applicable to understanding how similar systems like OpenAI’s Dal-e or Google’s Imagen work under the hood. These systems are perhaps best known for their amazing ability to turn text prompts (such as “paradise cosmic beach”) into a matching image. sometimes Well, usually, anyway.
‘System’ is an appropriate term, as stable diffusion (and similar systems) are actually made up of many separate components that work together to make the magic happen. [Jay]Its illustrated guide really shines here, as it starts at a very high level with just three components (each with its own neural network) and drills down as needed to explain what’s going on at a deeper level and how it all fits together. .
Some may be surprised to discover that the image-making part does not work like humans. That is, it doesn’t start with a blank canvas and build a picture bit by bit from the ground. It starts with a seed: a bunch of random words. get the sound minus in a series of steps that results in less noise and an aesthetically pleasing and (ideally) coherent image. Combine noise removal with the ability to guide in a way that favors compatibility with a text prompt, and you have the bones of a text-to-image generator. There is, of course, much more to it [Jay] Goes into considerable detail for those interested.
If you’re unfamiliar with stable diffusion or industry-creating AI, it’s one of those fields that’s changing so rapidly that it sometimes seems impossible to keep up. Luckily, our own Matthew Carlson explains all about what it is and why it’s important.
Static diffusion can be run locally. There’s a fantastic open-source web UI, so there’s no better time to get up to speed and start experimenting!