https://cerebralvalley.ai logo
#06-technical-discussion
Title
# 06-technical-discussion
u

umar ıgan

09/04/2023, 4:58 PM
Hello Everyone, I am looking for simple elegant way of fine tuning diffuser using dreambooth on small sample dataset. Any example you shared would be appreciated. I have no experience with diffuser so the examples on google kind of confusing me. I want to do a simple task on google colab T4 GPU.
1
j

Jett Sjöberg

09/04/2023, 10:21 PM
Have a look at diffusers /w accelerate docs on the github for diffusers dreambooth. If this needs to be run in production it requires more, but for a single training it is easily done in colab T4 in 800-1200 steps. https://github.com/huggingface/diffusers/tree/main/examples/dreambooth
u

umar ıgan

09/04/2023, 10:59 PM
Thanks Jett, I am trying this one on T4, still feels slow so I am checking lighter models and parameter optimization for faster and better results.
j

Jett Sjöberg

09/04/2023, 11:25 PM
Install/setup time will take a while. 15-20m. Average time to train, I would say anywhere from 14-20m depending how many images used. More images requires more training time and optimizations to make work. More images will also require many more epoch to make work. Spin-up time will eat a large portion of the actual runtime. This is why for production, need to have preconfigured containers.
On T4, you can maybe expect 8-12m if you use very few images and tune params, but using less images may result in less variation or ability. If you want to reduce the time significantly, down to 4-6m it is possible by using pre-defined variations. Such as focusing on headshots shoulder up, or say, generating front end photos of an Audi.
More variation requires more training time. Reducing the variations can bring the time down.
u

umar ıgan

09/05/2023, 7:48 AM
I am using 4 images with autotrain dreambooth on colab T4 environment, this process takes ~- hour to train somehow. Based on what you said it supposed to work faster. I used colab example below, I changed model to stable-diffusion 2 https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_Dreambooth.ipynb
j

Jett Sjöberg

09/05/2023, 7:07 PM
Hmm. Strange. You should be able to get decent results in 800-1200 steps on 4 images. It should take anywhere between 10-20m. Like I said, the actual set up time is the bulk of the time, not the training. I will have a look a bit later when I have some time.
u

umar ıgan

09/05/2023, 7:20 PM
Thank you 🙏 I can share my modified notebook as well, I had found another notebook that actually runs in 20-30 mins but this one for some reason slower.
j

Jett Sjöberg

09/05/2023, 8:37 PM
No problem. I would guess it has something to do with deps configuration.
Ah. I see. Did you select 512x512 and scale your images appropriately using something such as birme, or did you leave the default 1024x1024? This is training 24 images of an Audi Q7.
You can adjust batch size, and gradient_accum depending on how many images you train on however defaults usually work pretty well.
generated_image.png
u

umar ıgan

09/06/2023, 1:33 PM
I can train it in one of example notebook which uses accelerate and diffuser but autotrain-advanced one takes long time. Fast one; https://colab.research.google.com/drive/1I5ryM7rMh-p_5fYIb2lW2hMa095AwHPZ Slow one; https://colab.research.google.com/drive/1C4Q26qCDF-3pq2F0Wv9vib7HNkDwPvC_
j

Jett Sjöberg

09/06/2023, 6:25 PM
Autotrain-advanced notebook still takes a long time even when there is a resolution reduction to 512x512?
u

umar ıgan

09/06/2023, 6:38 PM
Yes,
I liked that one because it’s easier to process as a task in a technical discussion.
j

Jett Sjöberg

09/06/2023, 6:55 PM
That is strange. With basic params it should work on the free plan with training cycle ~20minutes. How many images are you training on? I was able to train on 24 images 512x512 on SD-2-1-base in about 20 minutes using the autotrain advanced notebook. The other notebook is essentially nearly the same notebook, with a few alterations such as class generation. Even still, the autotrain notebook should function nearly identically in performance on either T4 instance. I assume you are using the free plan resources? Have you utilized a lot of resources? Possibly you are being throttled?
u

umar ıgan

09/06/2023, 8:00 PM
I use 4 to 6 images, I am also subscribed to colab pro but only have T4 available always. Could you run slow one I shared in 20 mins?
j

Jett Sjöberg

09/06/2023, 8:02 PM
I sent request for access.
u

umar ıgan

09/06/2023, 8:06 PM
Shared
j

Jett Sjöberg

09/06/2023, 8:08 PM
What dataset are you testing on?
u

umar ıgan

09/06/2023, 8:33 PM
There must be 3-4 different url images, basically anime, lol game art and some classic art style datasets
I use few images from one of them and fine tune it on thise images
j

Jett Sjöberg

09/06/2023, 8:40 PM
Which dataset? "doggo" dataset is 846*982px, animes is 64*64px, cartoon is 4130*6916px. art dataset is unavailable. images are pulled down, no scaling is done. default is set to 1024px train.
If you are using images that are not pre-scaled, they need to buffer out. Be sure your images are scaled properly for training dataset, and set resolution to 512x512 in params. You can scale/crop images with birme or you can write a short opencv snippet that will scale/crop the training images after pulling them down from huggingface.
u

umar ıgan

09/06/2023, 8:46 PM
By prescales you mean I should scale them to what exactly 512x512? Sorry but there is limited resources around this topic out there.
j

Jett Sjöberg

09/06/2023, 8:47 PM
Before training, images need to be scaled to 512x512 and cropped appropriately.
u

umar ıgan

09/06/2023, 8:47 PM
Hah ok you write it up there, so it shou be same with resolution
j

Jett Sjöberg

09/06/2023, 8:47 PM
The images in this notebook are improper resolution.
u

umar ıgan

09/06/2023, 8:47 PM
I understand now, thanks a lot 🙏
j

Jett Sjöberg

09/06/2023, 8:51 PM
Params both need to be set to 512px for resolution. Right now it is default to 1024. Which is 1024*1024px. So if you choose anime which is 64*64px, but default 1024 in params, it reserves memory for 1024 size params tensor. This is huge memory requirement and computational overhead vs 512 resolution setting. So, resolution needs to be set to 512, and images need to be appropriately scaled and cropped. 😉
Suggest to use birme, and crop your images: https://www.birme.net, or write a short opencv snippet that will scale/crop images after they are downloaded.
u

umar ıgan

09/06/2023, 8:54 PM
I actually find out pillow has resize functionality so i resize it durin download in download function
👍 1
j

Jett Sjöberg

09/06/2023, 8:57 PM
Totally can do. I use opencv because I can select portion with face or object via yolo.
If you try to train on the whatsapp images in the repo on your github, resolution must be scaled. For sure. If it is very few images for single train, as demo, I use birme. If it is batching for production, I do it in code.
u

umar ıgan

09/06/2023, 9:00 PM
IMG_0718.jpg
Yeah since this is a task i kept it simple to show a flow
j

Jett Sjöberg

09/06/2023, 9:02 PM
Nice. Glad it is working.