Can DreamBooth Stable Diffusion be trained using xformers, 8-bit adam, gradient checkpointing, and caching latents with just 10 GB of VRAM?
Same, haha. I used to think 4 GB was a lot, but when I recently upgraded to an 8 GB card, I realised I wouldn’t need to upgrade again. I had never /heard/ of 16 GB+ cards before all this SD stuff happened, and I was a little dismayed when I couldn’t even achieve 512x512 on my 8 GB 3070.
But holy hell, the optimisations are rolling out quicker than I can blink, and now I might be able to utilise Dreambooth with my card? I’m stunned, lol.
Also check out this,
Someone appears to be in a difficult situation, yet even modest steps count. Here’s what I know, raining the text encoder alongside the UNet increases image quality dramatically, particularly when the text encoder is fine-tuned, the learning rate is low and there are enough training steps. Fine-tuning necessitates more memory, preferably a GPU with 24 GB RAM. Techniques such as 8-bit Adam, fp16 training and gradient accumulation enable training on 16 GB GPUs such as Google Colab or Kaggle. Their research have shown that employing natural descriptive phrases is effective, eliminating the need for unusual tokens from the lexicon.
Waiting for 6 here.