Added dynamic context size. This is perfect for servers running llama models as a service. #13295

J4e6eR · 2025-05-04T08:53:04Z

The context size which is used to allocate the space for model execution and KV caches, cannot be modified once the model and context params are initialized. This can be bad for servers running models as the context sizes are bound to increase overtime.
With dynamic context size, there is no need to restart the servers once the context size exceeds.

Dynamic context size is achieved by modifying the size of n_ctx in cparams followed by resetting the previous memory to create new memory using memory.reset(model.create_memory(params_mem, cparams));.
As new memory is created, the earlier context is deleted, the best way to save and load the state to preserve.

I will add load state feature as a default while performing this operation in next commit.

…ext size once the specofoed context size has reached. This is perfect for servers running llama models as a service.

… casting.

…o dynamic-ctx

J4e6eR · 2025-05-05T10:50:42Z

Next goal is to get a dynamic context size working without the need for resetting memory. Is it possible? Let's see!!

J4e6eR · 2025-05-07T08:21:41Z

Hey, @ggerganov
Please have a look at this, this can be helpful for the servers which might need a dynamic context size, which would prevent it from terminating with errors when the program exceeds the context size.
I am currently working on follow-up task which I posted earlier.
Furthermore, are there any changes you expect me to do, to improve this commit, I am open for suggestions and improvements.
Thank you.

ggerganov · 2025-05-07T09:42:11Z

Hi, I am not convinced that this is a useful feature. IMO the application should pre-allocate the worst-case amount of memory that it plans to use. This way, if it is able to start, you have a guarantee that it will keep running without running out of memory at some later point.

I don't see use cases where dynamically adjusting the context has an advantage compared to the existing logic.

J4e6eR · 2025-05-07T10:44:06Z

@ggerganov
So if the application allocates more amount of memory before hand, what's the significance of context size (n_ctx)?
Because earlier when I was testing one of the example codes, probably simple-chat, I did exceed the context size after few back and forth conversations with the model, and it actually terminated the program giving an error message "Context size exceeded".

J4e6eR · 2025-07-30T11:52:41Z

Finally achieved Dynamic modification of the context size without resetting the memory.

…amic-ctx

J4e6eR and others added 6 commits May 4, 2025 07:52

Added dynamic context size which is perfect for increasing model cont…

e0aee81

…ext size once the specofoed context size has reached. This is perfect for servers running llama models as a service.

Added the support for dumping and loading the state memory in dynamic…

80ce4f0

… casting.

Merge branch 'ggml-org:master' into dynamic-ctx

7f22263

Added the custom path support for the location of dump file.

9880169

Merge branch 'dynamic-ctx' of https://github.com/J4e6eR/llama.cpp int…

5e1a4ce

…o dynamic-ctx

Merge branch 'ggml-org:master' into dynamic-ctx

d8904d6

J4e6eR and others added 3 commits May 8, 2025 05:34

Normal changes.

525c9ae

Merge branch 'ggml-org:master' into dynamic-ctx

acaa784

Merge branch 'ggml-org:master' into dynamic-ctx

f471c74

J4e6eR force-pushed the dynamic-ctx branch 2 times, most recently from 73b85e4 to f471c74 Compare July 30, 2025 12:32

J4e6eR and others added 4 commits July 31, 2025 11:37

Merge branch 'master' of https://github.com/J4e6eR/llama.cpp into dyn…

1e7b16f

…amic-ctx

Dynamically modify the context wthout resetting the memory.

27456b5

Merge branch 'ggml-org:master' into dynamic-ctx

f2bd0ae

Removed File read/write overload while dynamically setting up context.

844d67a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added dynamic context size. This is perfect for servers running llama models as a service. #13295

Added dynamic context size. This is perfect for servers running llama models as a service. #13295

J4e6eR commented May 4, 2025

Uh oh!

J4e6eR commented May 5, 2025

Uh oh!

J4e6eR commented May 7, 2025

Uh oh!

ggerganov commented May 7, 2025

Uh oh!

J4e6eR commented May 7, 2025

Uh oh!

J4e6eR commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added dynamic context size. This is perfect for servers running llama models as a service. #13295

Are you sure you want to change the base?

Added dynamic context size. This is perfect for servers running llama models as a service. #13295

Conversation

J4e6eR commented May 4, 2025

Uh oh!

J4e6eR commented May 5, 2025

Uh oh!

J4e6eR commented May 7, 2025

Uh oh!

ggerganov commented May 7, 2025

Uh oh!

J4e6eR commented May 7, 2025

Uh oh!

J4e6eR commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants