Miso-diffusion-M-1.0

*Important, please read this before using the model because this is very experimental, as I am still trying to fine the optimal settings, and will slowly exit beta as the training become more stable You can download the clip text encoder here: https://huggingface.co/suzushi/miso-diffusion-m-1.0 I will write 2 articles soon as well on the details of the model. Miso Diffusion M 1.0 is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is a step up from previous version (beta), trained on the same 160k image for 3 more epoch then fine tuned on 600k images for another 2 epoch. (2 was choosen as further training would cause it to generate more artifact and blurry images) Recommanded setting, euler, cfg:5 , 28-40 steps, (denoise: 0.95 or 1 ) prompt: danbooru style tagging. I recommand simply generating with a batch size of 4 to 8 and pick the best one. It will struggle with hands and complex pose, you can add upper body so it doesnt generate full body. Quality tag Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality Aesthetic Tag Very Aesthetic, aesthetic Pleasent Very pleasent, pleasent, unpleasent Additional tag: high resolution, elegant Training was done in 1024x1024, though since the model natively supports 1440, certain prompt would work on 1440x1440 as well Training is done on gh200 with 96gb vram Training setting: Adafactor with a batchsize of 40, lr_scheduler: cosine SD3.5 Specific setting: enable_scaled_pos_embed = true pos_emb_random_crop_rate = 0.2 weighting_scheme = "flow" learning_rate = 3e-6 learning_rate_te1 = 2e-6 learning_rate_te2 = 2e-6 Train Clip: true, Train t5xxl: false Developing a base model is costly, so if you like my work please consider donation, thanks a lot: https://ko-fi.com/suzushi2024

Gallery

Miso-diffusion-M-1.0

Gallery

Miso-diffusion-M-1.0