Using the Stable Diffusion WebUi

Using the Stable Diffusion WebUi

Installing Stable Diffusion is one thing using the Stable Diffusion WebUI to create decent checkpoints, and understanding the different weights and measures, sampling sets, and prompts is where the real magic (and frustration) happen.

Why Stable Diffusion?

Recently I wrote an article for my employer allcode.com on how to set up Stable Diffusion and use Dreambooth to train models based on your own images to personalise the outputs, thus allowing you to create artistic renditions of yourself with various prompts.

The critical element is that it can run on a local gaming PC (although you do have to train the models on a pay-on-demand server). I highly recommend you read that article if you are interested in how to set up the various systems.

A long, long time ago, I used to love art; painting, drawing, and sculpting were really enjoyable for me. But sadly, over the years, I’ve let the meagre skills I had atrophy and now trying to return to square one is just disheartening.

So I’m finding this fusion between coding, AI, writing and art really fascinating. I see a lot of artists complaining about AI art (and with good reason). I’m definitely finding people selling AI art really dubious and troubling. That said, it’s here, it’s not going away, and as I’ll show throughout these blog posts, it is not so easy. I can see it being a great tool for artists and a problem in years to come.

 

AI generated image of me as legolas

What’s the plan

Over a series of posts, I’m going to write up what all the parts of the Stable Diffusion WebUI interface do, what they mean and how they relate back to things like Dreambooth and training your own models. I’ll include a little about configs for training models and probably show how changes to the training affected the outputs. 

 

What won’t be covered

I’m not going to explain all the various installs, libraries, variations and forks of Stable Diffusion, nor am I going to compare the outputs to other AI art creators like Midsummer etc. I’m also not going to answer questions on how to resolve code breakages that happen, unless I happen to experience one while writing.  

Maybe when I’ve finished playing with Stable Diffusion, I’ll look at others in the future.

 

jamestyrrell as a Warhammer 40k Space Marine SD, Dreambooth

The WebUI homepage

Today I’ve set the scene for what I hope to create so for now I’m just going to give a high overview of the various tabs and what they do.

Holy sh*t, there are a lot of options here

If you’ve followed my tutorial at allcode.com, you’ve already installed Stable Diffusion, run your personalised images through Dreambooth and are ready to get down to business. The first thing you see when you open the WebUI is a lot of options that aside from the prompt option will make little sense. Not to even mention that this is tab 1 of 8.

WebUI screen shot

txt2img

This is probably the most recognizable tab, as it is the key point of popularity for the Stable Diffusion craze. It’s where you type in prompts (both negative and positive) and generate images. On a default install of Stable Diffusion you can still get access to the massive wealth of information on artists, celebrities and politicians when creating these prompts. However, do remember to complete the Dreambooth training if you want to personalise any generated images. 

img2img

Image to image AI art generation (img2img) uses the same principle as that of txt2img above. Users still enter in prompts for the AI. The main difference between these two is that a base image is included into the scenario

Users upload a base photo where the AI applies changes based on entered prompts. Continuous iteration of this process creates refined and sophisticated art.

Extras

In this section you have the ability to upscale your produced images to bigger sizes, all sizes are usually produced as 512×512 and other tools to improve the on images that you like but feel need work

  • GFPGAN, neural network that fixes faces
  • CodeFormer, face restoration tool as an alternative to GFPGAN
  • RealESRGAN, neural network upscaler
  • ESRGAN, neural network upscaler with a lot of third party models
  • SwinIR and Swin2SR neural network upscalers
  • LDSR, Latent diffusion super resolution upscaling

PNG info

If you have an image generated by Simple-Diffusion it should includes all the details of the prompt, negative prompts, seed, model and other information used to generate it. So rather than trying to rememver a prompt that you liked you can find the produced image and drag-and-drop the image onto the PNG info tab you will recover all the info and then send it to other areas of the WebUI

Checkpoint Merger

Combine and blend different checkpoints and models. 

Train

If you have a monster of a machine with 24Gb of VRAM on your GPU you can train your models locally, normally trying will just make the system crash and laugh at you. If you want you there is a “Dreambooth extension” which can be found in the Extensions tab described below which can help lower powered systems train but otherwise you’re better off training models on a service like https://www.runpod.io/ until either the systems get smarter (less GPU required) or the graphics cards get cheaper.  

Settings

The settings tab for Stable Diffusion WebUI allows you to customize the various options and preferences. The options and preferences available include things like the color scheme, where output images are stored, and the way that the program interacts with other programs or devices. In general, the settings tab is where you can fine-tune the way that the Stable Diffusion WebUI works to suit your needs and preferences.

Extensions

An extension is a small software program that can be installed in the Stable-Diffusion WebUI to add or modify the functionality. Extensions can be used to add features to the WebUI, such as a new tab, or to modify the behavior of Stable Diffusion, such as reducing the load for Dreambooth or blocking certain websites. Extensions can be installed and updated from this tab, and once installed, they can be turned on or off as required.

Stable Diffusion v2.0 and Prompts

Stable Diffusion v2.0 and Prompts

Stable Diffusion v2.0 and prompts

The update brings improvements to SD’s text-to-image diffusion models, includes a powerful image upscaler, updates its Inpainting model, and more.

Also it appears to have made massive updates to the prompting system comparing Stable diffusion v1.4 to v2 gives pretty vast differences. They produced a prompts guide which was online but seems to have disappeared?

But irritatingly it can’t be downloaded as a .pdf, me being a massive nerd said f**k that and used python to pull the svgs and created a pdf.

Here are a few examples of how different things can get

So here are a few examples with the relevant seeds and prompts to show you how dramatically things have changed.   Also, it should be mentioned a lot of celebrities have been removed from the model, NSFW images and famous artists like Rutkowski have been purged due to complaints of their style being essentially stolen by the machine.

What has been removed from Stable Diffusion’s training data, though, is nude and pornographic images. AI image generators are already being used to generate NSFW output, including both photorealistic and anime-style pictures. However, these models can also be used to generate NSFW imagery resembling specific individuals (known as non-consensual pornography) and images of child abuse.

Of course, there are a lot of angry Incels now wailing about censorship. This is bullshit, this stuff is open-source. No doubt a horde of horny teenagers are using PornHub and other sites to train NSFW models. 

Prompt: Gandalf, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha
Negative prompt: cartoon, 3d, ugly face, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, hands

Steps: 60, Sampler: DDIM, CFG scale: 12, Seed: 1940895508, Face restoration: GFPGAN, Size: 512×512, Model hash: 7460a6fa, Batch size: 6, Batch pos: 0

gandalf, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha<br />
Negative prompt: cartoon, 3d, ugly face, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, hands<br />
Steps: 60, Sampler: DDIM, CFG scale: 12, Seed: 1940895508, Face restoration: GFPGAN, Size: 512x512, Model hash: 7460a6fa, Batch size: 6, Batch pos: 0

Stable Diffusion 1.4

gandalf, d & d, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by artgerm and greg rutkowski and alphonse mucha<br />
Negative prompt: cartoon, 3d, ugly face, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, hands<br />
Steps: 60, Sampler: DDIM, CFG scale: 12, Seed: 1940895508, Face restoration: GFPGAN, Size: 512x512, Model hash: 7460a6fa, Batch size: 6, Batch pos: 0

Stable Diffusion 2.0

Prompt: Portrait digital art of Bill Murray from Scrooged (Arcane). wearing a suit, Christmas,

Negative prompt: cartoon, 3d, ugly face, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, hands

Steps: 40, Sampler: DDIM, CFG scale: 13, Seed: 3408805356, Face restoration: GFPGAN, Size: 512×512, Model hash: 7460a6fa, Batch size: 6, Batch pos: 0

Portrait digital art of Bill Murray from Scrooged (Arcane). wearing a suit, Christmas,

Stable Diffusion 1.4

Portrait digital art of Bill Murray from Scrooged (Arcane). wearing a suit, Christmas,<br />
Negative prompt: cartoon, 3d, ugly face, (disfigured), (bad art), (deformed), (poorly drawn), (extra limbs), strange colours, blurry, boring, sketch, lacklustre, repetitive, cropped, hands<br />
Steps: 40, Sampler: DDIM, CFG scale: 13, Seed: 3408805360, Face restoration: GFPGAN, Size: 512x512, Model hash: 09dd2ae4, Batch size: 6, Batch pos: 4

Stable Diffusion 2.0

What’s next

I’ve started writing up how the Stable-Diffusion WebUI works and will probably wite up how to scrape a google SVG slideshow.  

AI Generated Comics and Copyright

AI Generated Comics and Copyright

What is the full story

So recently the writer Kris Kashtanova used Midjournery AI to generate all the images in a comic, and then just recently the U.S. Copyright Office appears to be backtracking on its decision to grant protection to an AI-generated comic book. now on the face of it this seems like a no-brainer of course he shouldn’t be able to copyright the images generated by a machine, no matter how carefully they were obliged to type the prompts in to get the images they desired for the story (no mean feat). 

However, there is a contentious element here for me, it looks like the USPTO is removing the copyright from the entire book including the writing. Now perhaps this isn’t the case and I’m misreading things but even in a comic book art and writing are two separate elements. Kashtanova didn’t use ChatGPT to write the story or text in the comic, that was his own work, the book should still be copyrighted to him even if the art isn’t his. 

It would be like me finding royalty free for use images online and then using them to illustrate my story, am I now in danger of losing my other rights, what about a talented painter who uses ChatGPT to help write a story with prompts and then paints the scenes, what happens to their rights? 

The future is going to get even messier

Sometimes it is hard to realize the incredible progress we’ve made in terms of cheap, accessible computing power and AI. Then something like this comes along and people are shocked, but this is really just the beginning of this, what we are witnessing is the tip of the AI glacier that is about crash down on society

In 10 years AI art generators will be an order of magnitude better, and our ability to prompt them with far more nuanced, faces, eyes, and body proportions are all hit or miss currently. In 10 or 20 years we’ll see fusions of deepfakes, art and media that will change a lot of industries, forever. 

Let’s take an example I love the film Scrooged so in 20 years I think hey let’s watch that, but you know I’ll make it a cartoon, like Arcane and add labelling to the action scenes like in Scott Pilgrim vs the World, maybe change out the bad guy for Alan Rickman. Then maybe 20/30 minutes later, I watch my new personalised version of Scrooged. 

That’s not hyperbole, that’s just where things are going to be on the route we’re currently heading. If you think that sounds crazy, remember that 20 years ago the best kind of phone you could have was a Nokia. 

 

Portrait digital art of Bill Murray from Scrooged (Arcane). wearing a suit, Christmas,

What’s next

I’m going to write more about this because it really interests me and as I study and learn about it I’ll keep a running commentary on my blog.

It’s the first time in a long time I’ve seen a subject that’s blown me away like this so hopefully, that will translate into me getting back into writing fiction, I’ll just have to be careful how I illustrate it I guess ;).