benchmark: Intel i7-10700K, Ubuntu 22.04 #30

leejet · 2023-08-26T09:11:10Z

leejet
Aug 26, 2023
Maintainer

OS: Ubuntu 22.04
CPU: Intel i7-10700K
RAM: 32GB

Original

commit: 0d7f04b
OPENBLAS: OFF
CUBLAS: OFF

git checkout 0d7f04b135cd48e8d62aecd09a52eb2afa482744
git submodule update

command

./bin/sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" -v

result

[INFO]  stable-diffusion.cpp:3264 - step 1 sampling completed, taking 20.51s
[INFO]  stable-diffusion.cpp:3585 - sampling completed, taking 412.93s
[INFO]  stable-diffusion.cpp:3598 - decode_first_stage completed, taking 49.87s
[INFO]  stable-diffusion.cpp:3606 - txt2img completed in 463.00s, use 2358.73MB of memory: peak params memory 1969.97MB, peak runtime memory 2177.12MB

Option:
    n_threads:       8
    mode:            txt2img
    model_path:      ../models/sd-v1-4-ggml-model-f16.bin
    output_path:     output.png
    init_img:
    prompt:          a lovely cat
    negative_prompt:
    cfg_scale:       7.00
    width:           512
    height:          512
    sample_method:   eular a
    sample_steps:    20
    strength:        0.75
    seed:            42
System Info:
    BLAS = 0
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[INFO]  stable-diffusion.cpp:2687 - loading model from '../models/sd-v1-4-ggml-model-f16.bin'
[DEBUG] stable-diffusion.cpp:2695 - verifying magic
[DEBUG] stable-diffusion.cpp:2706 - loading hparams
[INFO]  stable-diffusion.cpp:2712 - ftype: f16
[DEBUG] stable-diffusion.cpp:2718 - loading vocab
[DEBUG] stable-diffusion.cpp:2746 - ggml tensor size = 272 bytes
[DEBUG] stable-diffusion.cpp:2751 - clip params ctx size =  236.23 MB
[DEBUG] stable-diffusion.cpp:2770 - unet params ctx size =  1641.36 MB
[DEBUG] stable-diffusion.cpp:2791 - vae params ctx size =  95.51 MB
[DEBUG] stable-diffusion.cpp:2812 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:2828 - loading weights
[DEBUG] stable-diffusion.cpp:2932 - model size = 1969.67MB
[INFO]  stable-diffusion.cpp:2937 - total params size = 1969.97MB (clip 235.01MB, unet 1640.45MB, vae 94.51MB)
[INFO]  stable-diffusion.cpp:2943 - loading model from '../models/sd-v1-4-ggml-model-f16.bin' completed, taking 4.19s
[DEBUG] stable-diffusion.cpp:766  - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] stable-diffusion.cpp:353  - split prompt "a lovely cat" to tokens ["a</w>", "lovely</w>", "cat</w>", ]
[DEBUG] stable-diffusion.cpp:2980 - condition context need 1.64MB static memory, with work_size needing 0.45MB
[DEBUG] stable-diffusion.cpp:3005 - building condition graph completed: 633 nodes, 210 leafs
[DEBUG] stable-diffusion.cpp:3013 - computing condition graph completed, taking 0.10s
[INFO]  stable-diffusion.cpp:3059 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB)
[DEBUG] stable-diffusion.cpp:3068 - 236544 bytes of dynamic memory has not been released yet
[DEBUG] stable-diffusion.cpp:766  - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp:353  - split prompt "" to tokens []
[DEBUG] stable-diffusion.cpp:2980 - condition context need 1.64MB static memory, with work_size needing 0.45MB
[DEBUG] stable-diffusion.cpp:3005 - building condition graph completed: 633 nodes, 210 leafs
[DEBUG] stable-diffusion.cpp:3013 - computing condition graph completed, taking 0.10s
[INFO]  stable-diffusion.cpp:3059 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB)
[DEBUG] stable-diffusion.cpp:3068 - 236544 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3564 - get_learned_condition completed, taking 0.20s
[INFO]  stable-diffusion.cpp:3580 - start sampling
[DEBUG] stable-diffusion.cpp:3117 - diffusion context need 69.56MB static memory, with work_size needing 67.50MB
[INFO]  stable-diffusion.cpp:3264 - step 1 sampling completed, taking 20.51s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 2 sampling completed, taking 20.58s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 3 sampling completed, taking 20.49s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 4 sampling completed, taking 20.75s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 5 sampling completed, taking 20.83s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 6 sampling completed, taking 20.65s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 7 sampling completed, taking 20.86s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 8 sampling completed, taking 20.69s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 9 sampling completed, taking 20.87s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 10 sampling completed, taking 20.59s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 11 sampling completed, taking 20.57s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 12 sampling completed, taking 20.77s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 13 sampling completed, taking 20.57s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 14 sampling completed, taking 20.67s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 15 sampling completed, taking 20.80s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 16 sampling completed, taking 20.54s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 17 sampling completed, taking 20.61s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 18 sampling completed, taking 20.55s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 19 sampling completed, taking 20.49s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 20 sampling completed, taking 20.53s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3284 - diffusion graph use 2264.22MB of memory: params 1640.45MB, runtime 623.77MB (static 69.56MB, dynamic 554.21MB)
[DEBUG] stable-diffusion.cpp:3292 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3585 - sampling completed, taking 412.93s
[DEBUG] stable-diffusion.cpp:3455 - vae context need 1153.12MB static memory, with work_size needing 1152.00MB
[DEBUG] stable-diffusion.cpp:3485 - computing vae graph completed, taking 49.86s
[INFO]  stable-diffusion.cpp:3501 - vae graph use 2271.63MB of memory: params 94.51MB, runtime 2177.12MB (static 1153.12MB, dynamic 1024.00MB)
[DEBUG] stable-diffusion.cpp:3509 - 3145728 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3598 - decode_first_stage completed, taking 49.87s
[INFO]  stable-diffusion.cpp:3606 - txt2img completed in 463.00s, use 2358.73MB of memory: peak params memory 1969.97MB, peak runtime memory 2177.12MB
save result image to 'output.png'

Improvement 1

commit: 467bc5b
OPENBLAS: OFF
CUBLAS: OFF

git checkout 467bc5baeb7fe3bcb07827582ae64e9136ab4519
git submodule update

command

./bin/sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" -v

result

[INFO]  stable-diffusion.cpp:3264 - step 1 sampling completed, taking 18.95s
[INFO]  stable-diffusion.cpp:3585 - sampling completed, taking 379.85s
[INFO]  stable-diffusion.cpp:3598 - decode_first_stage completed, taking 36.92s
[INFO]  stable-diffusion.cpp:3606 - txt2img completed in 416.96s, use 2358.73MB of memory: peak params memory 1969.97MB, peak runtime memory 2177.12MB

Option:
    n_threads:       8
    mode:            txt2img
    model_path:      ../models/sd-v1-4-ggml-model-f16.bin
    output_path:     output.png
    init_img:
    prompt:          a lovely cat
    negative_prompt:
    cfg_scale:       7.00
    width:           512
    height:          512
    sample_method:   eular a
    sample_steps:    20
    strength:        0.75
    seed:            42
System Info:
    BLAS = 0
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[INFO]  stable-diffusion.cpp:2687 - loading model from '../models/sd-v1-4-ggml-model-f16.bin'
[DEBUG] stable-diffusion.cpp:2695 - verifying magic
[DEBUG] stable-diffusion.cpp:2706 - loading hparams
[INFO]  stable-diffusion.cpp:2712 - ftype: f16
[DEBUG] stable-diffusion.cpp:2718 - loading vocab
[DEBUG] stable-diffusion.cpp:2746 - ggml tensor size = 272 bytes
[DEBUG] stable-diffusion.cpp:2751 - clip params ctx size =  236.23 MB
[DEBUG] stable-diffusion.cpp:2770 - unet params ctx size =  1641.36 MB
[DEBUG] stable-diffusion.cpp:2791 - vae params ctx size =  95.51 MB
[DEBUG] stable-diffusion.cpp:2812 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:2828 - loading weights
[DEBUG] stable-diffusion.cpp:2932 - model size = 1969.67MB
[INFO]  stable-diffusion.cpp:2937 - total params size = 1969.97MB (clip 235.01MB, unet 1640.45MB, vae 94.51MB)
[INFO]  stable-diffusion.cpp:2943 - loading model from '../models/sd-v1-4-ggml-model-f16.bin' completed, taking 4.21s
[DEBUG] stable-diffusion.cpp:766  - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] stable-diffusion.cpp:353  - split prompt "a lovely cat" to tokens ["a</w>", "lovely</w>", "cat</w>", ]
[DEBUG] stable-diffusion.cpp:2980 - condition context need 1.64MB static memory, with work_size needing 0.45MB
[DEBUG] stable-diffusion.cpp:3005 - building condition graph completed: 633 nodes, 210 leafs
[DEBUG] stable-diffusion.cpp:3013 - computing condition graph completed, taking 0.10s
[INFO]  stable-diffusion.cpp:3059 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB)
[DEBUG] stable-diffusion.cpp:3068 - 236544 bytes of dynamic memory has not been released yet
[DEBUG] stable-diffusion.cpp:766  - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp:353  - split prompt "" to tokens []
[DEBUG] stable-diffusion.cpp:2980 - condition context need 1.64MB static memory, with work_size needing 0.45MB
[DEBUG] stable-diffusion.cpp:3005 - building condition graph completed: 633 nodes, 210 leafs
[DEBUG] stable-diffusion.cpp:3013 - computing condition graph completed, taking 0.10s
[INFO]  stable-diffusion.cpp:3059 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB)
[DEBUG] stable-diffusion.cpp:3068 - 236544 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3564 - get_learned_condition completed, taking 0.19s
[INFO]  stable-diffusion.cpp:3580 - start sampling
[DEBUG] stable-diffusion.cpp:3117 - diffusion context need 69.56MB static memory, with work_size needing 67.50MB
[INFO]  stable-diffusion.cpp:3264 - step 1 sampling completed, taking 18.95s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 2 sampling completed, taking 18.96s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 3 sampling completed, taking 19.01s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 4 sampling completed, taking 19.06s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 5 sampling completed, taking 19.00s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 6 sampling completed, taking 18.83s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 7 sampling completed, taking 19.09s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 8 sampling completed, taking 18.99s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 9 sampling completed, taking 18.88s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 10 sampling completed, taking 18.94s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 11 sampling completed, taking 18.98s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 12 sampling completed, taking 18.97s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 13 sampling completed, taking 18.92s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 14 sampling completed, taking 18.91s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 15 sampling completed, taking 19.03s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 16 sampling completed, taking 19.74s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 17 sampling completed, taking 18.81s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 18 sampling completed, taking 18.88s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 19 sampling completed, taking 18.88s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 20 sampling completed, taking 19.02s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 623.77MB runtime memory: static 69.56MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3284 - diffusion graph use 2264.22MB of memory: params 1640.45MB, runtime 623.77MB (static 69.56MB, dynamic 554.21MB)
[DEBUG] stable-diffusion.cpp:3292 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3585 - sampling completed, taking 379.85s
[DEBUG] stable-diffusion.cpp:3455 - vae context need 1153.12MB static memory, with work_size needing 1152.00MB
[DEBUG] stable-diffusion.cpp:3485 - computing vae graph completed, taking 36.91s
[INFO]  stable-diffusion.cpp:3501 - vae graph use 2271.63MB of memory: params 94.51MB, runtime 2177.12MB (static 1153.12MB, dynamic 1024.00MB)
[DEBUG] stable-diffusion.cpp:3509 - 3145728 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3598 - decode_first_stage completed, taking 36.92s
[INFO]  stable-diffusion.cpp:3606 - txt2img completed in 416.96s, use 2358.73MB of memory: peak params memory 1969.97MB, peak runtime memory 2177.12MB
save result image to 'output.png'

Improvement 2

commit: d765b95
OPENBLAS: OFF
CUBLAS: OFF

git checkout d765b95ed10e47469fb3b0a4a76831c38e301828
git submodule update

command

./bin/sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" -v

result

[INFO]  stable-diffusion.cpp:3264 - step 1 sampling completed, taking 16.35s
[INFO]  stable-diffusion.cpp:3585 - sampling completed, taking 330.75s
[INFO]  stable-diffusion.cpp:3598 - decode_first_stage completed, taking 26.43s
[INFO]  stable-diffusion.cpp:3606 - txt2img completed in 357.37s, use 2301.26MB of memory: peak params memory 1969.97MB, peak runtime memory 1665.14MB

Option:
    n_threads:       8
    mode:            txt2img
    model_path:      ../models/sd-v1-4-ggml-model-f16.bin
    output_path:     output.png
    init_img:
    prompt:          a lovely cat
    negative_prompt:
    cfg_scale:       7.00
    width:           512
    height:          512
    sample_method:   eular a
    sample_steps:    20
    strength:        0.75
    seed:            42
System Info:
    BLAS = 0
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[INFO]  stable-diffusion.cpp:2687 - loading model from '../models/sd-v1-4-ggml-model-f16.bin'
[DEBUG] stable-diffusion.cpp:2695 - verifying magic
[DEBUG] stable-diffusion.cpp:2706 - loading hparams
[INFO]  stable-diffusion.cpp:2712 - ftype: f16
[DEBUG] stable-diffusion.cpp:2718 - loading vocab
[DEBUG] stable-diffusion.cpp:2746 - ggml tensor size = 272 bytes
[DEBUG] stable-diffusion.cpp:2751 - clip params ctx size =  236.23 MB
[DEBUG] stable-diffusion.cpp:2770 - unet params ctx size =  1641.36 MB
[DEBUG] stable-diffusion.cpp:2791 - vae params ctx size =  95.51 MB
[DEBUG] stable-diffusion.cpp:2812 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:2828 - loading weights
[DEBUG] stable-diffusion.cpp:2932 - model size = 1969.67MB
[INFO]  stable-diffusion.cpp:2937 - total params size = 1969.97MB (clip 235.01MB, unet 1640.45MB, vae 94.51MB)
[INFO]  stable-diffusion.cpp:2943 - loading model from '../models/sd-v1-4-ggml-model-f16.bin' completed, taking 4.21s
[DEBUG] stable-diffusion.cpp:766  - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] stable-diffusion.cpp:353  - split prompt "a lovely cat" to tokens ["a</w>", "lovely</w>", "cat</w>", ]
[DEBUG] stable-diffusion.cpp:2980 - condition context need 1.64MB static memory, with work_size needing 0.45MB
[DEBUG] stable-diffusion.cpp:3005 - building condition graph completed: 633 nodes, 210 leafs
[DEBUG] stable-diffusion.cpp:3013 - computing condition graph completed, taking 0.10s
[INFO]  stable-diffusion.cpp:3059 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB)
[DEBUG] stable-diffusion.cpp:3068 - 236544 bytes of dynamic memory has not been released yet
[DEBUG] stable-diffusion.cpp:766  - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp:353  - split prompt "" to tokens []
[DEBUG] stable-diffusion.cpp:2980 - condition context need 1.64MB static memory, with work_size needing 0.45MB
[DEBUG] stable-diffusion.cpp:3005 - building condition graph completed: 633 nodes, 210 leafs
[DEBUG] stable-diffusion.cpp:3013 - computing condition graph completed, taking 0.09s
[INFO]  stable-diffusion.cpp:3059 - condition graph use 239.58MB of memory: params 235.01MB, runtime 4.57MB (static 1.64MB, dynamic 2.93MB)
[DEBUG] stable-diffusion.cpp:3068 - 236544 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3564 - get_learned_condition completed, taking 0.19s
[INFO]  stable-diffusion.cpp:3580 - start sampling
[DEBUG] stable-diffusion.cpp:3117 - diffusion context need 12.09MB static memory, with work_size needing 10.00MB
[INFO]  stable-diffusion.cpp:3264 - step 1 sampling completed, taking 16.35s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 2 sampling completed, taking 16.65s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 3 sampling completed, taking 16.50s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 4 sampling completed, taking 16.52s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 5 sampling completed, taking 16.45s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 6 sampling completed, taking 16.71s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 7 sampling completed, taking 16.67s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 8 sampling completed, taking 16.58s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 9 sampling completed, taking 16.73s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 10 sampling completed, taking 16.51s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 11 sampling completed, taking 16.59s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 12 sampling completed, taking 16.36s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 13 sampling completed, taking 16.73s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 14 sampling completed, taking 16.52s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 15 sampling completed, taking 16.44s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 16 sampling completed, taking 16.47s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 17 sampling completed, taking 16.44s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 18 sampling completed, taking 16.64s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 19 sampling completed, taking 16.39s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3264 - step 20 sampling completed, taking 16.48s
[DEBUG] stable-diffusion.cpp:3265 - diffusion graph use 566.30MB runtime memory: static 12.09MB, dynamic 554.21MB
[DEBUG] stable-diffusion.cpp:3269 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3284 - diffusion graph use 2206.75MB of memory: params 1640.45MB, runtime 566.30MB (static 12.09MB, dynamic 554.21MB)
[DEBUG] stable-diffusion.cpp:3292 - 65536 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3585 - sampling completed, taking 330.75s
[DEBUG] stable-diffusion.cpp:3455 - vae context need 1.14MB static memory, with work_size needing 0.00MB
[DEBUG] stable-diffusion.cpp:3485 - computing vae graph completed, taking 26.42s
[INFO]  stable-diffusion.cpp:3501 - vae graph use 1759.64MB of memory: params 94.51MB, runtime 1665.14MB (static 1.14MB, dynamic 1664.00MB)
[DEBUG] stable-diffusion.cpp:3509 - 3145728 bytes of dynamic memory has not been released yet
[INFO]  stable-diffusion.cpp:3598 - decode_first_stage completed, taking 26.43s
[INFO]  stable-diffusion.cpp:3606 - txt2img completed in 357.37s, use 2301.26MB of memory: peak params memory 1969.97MB, peak runtime memory 1665.14MB
save result image to 'output.png'

juniofaathir · 2023-08-26T17:11:40Z

juniofaathir
Aug 26, 2023

With your newest improvement, my inference time is reduced from 54-58s/step down to 44-48/step.

Windows 10 laptop; Intel i5-8250U; 12Gb RAM 2600 Mhz; generating 512 x 512 image

I'm using your windows ready avx2 version

0 replies

juniofaathir · 2023-08-28T10:37:57Z

juniofaathir
Aug 28, 2023

Hey @leejet
I've just trying your sd.cpp without BLAS for the sake of curiousity. It's surprisingly decrease the inference time further on my laptop!

Using BLAS 44-48s/step, but w/o BLAS it's 34-38s/step. I wonder how could this happen? Is BLAS itself not very good on sd.cpp?

Ubuntu Jammy 22.04 laptop; Intel i5-8250U; 12Gb RAM 2600 Mhz; generating 512 x 512 image

1 reply

leejet Aug 28, 2023
Maintainer Author

Because by default, both GGML and OpenBLAS use the same number of threads as the physical cores of the CPU for multithreaded computation. When OpenBLAS is enabled, its threads compete with GGML's threads, which could potentially lead to decreased speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

benchmark: Intel i7-10700K, Ubuntu 22.04 #30

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

benchmark: Intel i7-10700K, Ubuntu 22.04 #30

Uh oh!

Uh oh!

leejet Aug 26, 2023 Maintainer

Original

Improvement 1

Improvement 2

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

juniofaathir Aug 26, 2023

Uh oh!

juniofaathir Aug 28, 2023

Uh oh!

leejet Aug 28, 2023 Maintainer Author

leejet
Aug 26, 2023
Maintainer

Replies: 2 comments 1 reply

juniofaathir
Aug 26, 2023

juniofaathir
Aug 28, 2023

leejet Aug 28, 2023
Maintainer Author