CUDA-graph-compatible releasing and resuming KV cache and model weight memory #3532
Job | Run time |
---|---|
3m 47s | |
9m 48s | |
5m 21s | |
6m 22s | |
11m 25s | |
9m 52s | |
16m 12s | |
13m 16s | |
12m 20s | |
13m 19s | |
13m 5s | |
15m 20s | |
1s | |
2h 10m 8s |
Job | Run time |
---|---|
3m 47s | |
9m 48s | |
5m 21s | |
6m 22s | |
11m 25s | |
9m 52s | |
16m 12s | |
13m 16s | |
12m 20s | |
13m 19s | |
13m 5s | |
15m 20s | |
1s | |
2h 10m 8s |