Replies: 1 comment
-
When you start the server, you should see lines like this:
Something slightly higher than sum of these numbers should be a good starting point for the model/context size/context data type combination you use. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I need to measure how much RAM is necessary to run the llama-server so I run this script to print the VSZ and RSS
I tried sending requests continuously to the server and monitor these metrics and I see they increase gradually and not reduce
Is this normal? Is there any way to determine the necessary RAM or allocate exactly the number of RAM for llama-server?
Thank you so much
Beta Was this translation helpful? Give feedback.
All reactions