Replies: 5 comments 6 replies
-
Many thanks!!! It seems to be the right call, at least for my Pascal GTX 1070Ti:
-> I'll revise my statements in #1166 |
Beta Was this translation helpful? Give feedback.
-
Hey people, I merged many of your feedbacks to main, please update, and remove all setting/cmd flags, change back to default settings, and test the performance again. it can be helpful if you can put test statistics here (after updates) see also 31bed67 |
Beta Was this translation helpful? Give feedback.
-
hey guys, just fixed some stuff, sry and please try again😹 |
Beta Was this translation helpful? Give feedback.
-
My 1070 Ti 8GB and 16GB RAM go for SD and XL models, they are fine now without starting args. tl;dr flux is not working, crashes even though it looks like it could and should run with https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/blob/main/flux1-dev-bnb-nf4-v2.safetensors. Could be user error, I'm maybe missing something... Tested also a flux model which wasn't working before - took https://huggingface.co/silveroxides/flux1-nf4-weights/blob/main/flux1-schnell-nf4.safetensors with size of 6GB, which should be comparable to an XL model? Thought, it maybe would also work on lower memory specs PCs. However, seems like adding up the encoders ae, clip_l and t5xxl_fp8_e4m3fn is just too much. Loading models leads to lags, but after a while it went through... and stops with an error. Maybe it's the model itself, or incompatibilities, or maybe Flux in general is too much for my system, even the smaller models. Most scary part was, how much space it tried to free...
But I'm curious and don't like to give up, so gave another try for the model https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/blob/main/flux1-dev-bnb-nf4-v2.safetensors... and it worked, kind of... it loaded, started generation, even at decent speeds of ~28s/it... but it just yielded some white-yellow goop of a picture in the preview. After ~5mins for 512*768, 10 steps, when finishing, it crashes. So it seems like it could work, just something is not working properly.
|
Beta Was this translation helpful? Give feedback.
-
Hello @lllyasviel. I just tested the newest commit as of right now and it works perfectly without any cmd args. Though, I seem to notice that the GPU weights now behave differently. (Note: GTX 1050 Ti 4GB vram) Before, I need to use 2440MB but now I need to use 1752MB (possibly +90MB more but this is safer). This leads to a max gpu memory usage of 3900MiB as seen on nvidia-smi, setting it higher leads to usage of shared gpu memory which drastically slows down the inference time. One thing I also noticed is that when inferencing happens at first pass (768x1280) it is 3900MB but when hires. fix (1152x1920) (1.5x) is now happening, the memory goes down to 3856MiB with no shared gpu memory usage. Ram usage is also reduced by about 3-6GB. From around peak 26.5GB to around 21GB ram usage. Inferencing time is around 9min20s±3% using my settings above without the cmd args which might seem worse but it is actually much stable. Before there were noticable prompt ghosting and I need to restart the forge webui but now it is gone! Version: f2.0.1v1.10.1-previous-399-g852e8856 Edit: Grammar. |
Beta Was this translation helpful? Give feedback.
-
New forge build introduced many new features and improvements but it also broke or rather changed how lowvram works (--lowvram removed). This affected my gpu (gtx 1050 ti) and maybe others too. (Need confirmations).
IMPORTANT! - Some parameters/settings are OUTDATED as of August 22. Refer to comments below. Outdated info will be marked with *****
CPU: Intel Xeon E3-1270 v3
RAM: 32GB DDR3 1600mhz
GPU: Nvidia GeForce GTX 1050 Ti 4GB vram//Undervolted + Overclocked//
Note: I am using a 5400rpm hdd so my first generation time is 13-14min as it loads everything but subsequent generations goes to 9-10min
My parameters:
webui-user.bat*****
Diffusion in Low Bits: Automatic***
Inference params
Hires. fix
Never OOM Integrated
OLD generation time: around 10min10s±5%
NEW generation time: around 9min±5%*****
1.5-2.5min improvement in inference time. *****
Note: The build significantly uses more ram! Old forge build gets my system ram not higher than 19gb but now it goes 24gb+.*****
Please comment your thoughts and if there are any errors/possible improvements in my configuration.
edit: Readability and Updated info notice
Beta Was this translation helpful? Give feedback.
All reactions