Length specific pricing bands for gemini-1.5-flash-latest #53

lukestanley · 2024-06-16T14:49:25Z

I saw on https://ai.google.dev/pricing for the latest Gemini models they have 2 different bands of pricing rules, based on the length
To support this, the pricing logic and data structure might need some changes.

Maybe something like this:

def calculate_cost_by_tokens(model_name, input_tokens, output_tokens):
    prices = model_prices[model_name]

    # Check for the presence of specific token limits and pricing structures
    if 'input_cost_per_token_short' in prices and 'max_input_tokens_short' in prices:
        max_input_tokens_short = prices['max_input_tokens_short']
        if input_tokens <= max_input_tokens_short:
            input_cost_per_token = prices['input_cost_per_token_short']
            output_cost_per_token = prices['output_cost_per_token_short']
        else:
            input_cost_per_token = prices['input_cost_per_token_long']
            output_cost_per_token = prices['output_cost_per_token_long']
    else:
        input_cost_per_token = prices['input_cost_per_token']
        output_cost_per_token = prices['output_cost_per_token']

    input_cost = input_tokens * input_cost_per_token
    output_cost = output_tokens * output_cost_per_token
    total_cost = input_cost + output_cost

    return total_cost

So the pricing data might need to be stored like this:

{
    "gemini-1.5-flash-latest": {
        "max_tokens": 8192,
        "max_input_tokens": 1000000,
        "max_input_tokens_short": 128000,
        "max_output_tokens": 8192,
        "input_cost_per_token_short": 3.5e-07,
        "input_cost_per_token_long": 7e-07,
        "output_cost_per_token_short": 1.0500000000000001e-06,
        "output_cost_per_token_long": 2.1000000000000002e-06,
        "litellm_provider": "vertex_ai-language-models",
        "mode": "chat",
        "supports_function_calling": true,
        "supports_vision": true,
        "source": "https://ai.google.dev/pricing"
    }
}

Proposed new optional properties:

max_input_tokens_short: The maximum number of input tokens for the lower pricing band.
input_cost_per_token_short: Cost per input token when the input token count is within the max_input_tokens_short limit.
input_cost_per_token_long: Cost per input token when the input token count exceeds the max_input_tokens_short limit.
output_cost_per_token_short: Cost per output token when the input token count is within the max_input_tokens_short limit.
output_cost_per_token_long: Cost per output token when the input token count exceeds the max_input_tokens_short limit.

Obviously it'd need tests and the numbers reviewing. Wouldn't be surprised if I've got at least 0 off-by-one errors! ;)

Seems like a very useful library, I didn't find the info I needed so hope this helps.
What do you think?
@areibman

The text was updated successfully, but these errors were encountered:

areibman · 2024-06-16T22:36:11Z

This would be great, actually. The main considerations are around how to manage the cost dictionary:

The big challenge is that we rely on a 3rd party cost dictionary manager from LiteLLM. We also have a function that pulls the latest dictionary from their repo and pulls updates the TOKEN_COSTS variable.

I raised an issue to LiteLLM about this just now. I think your solution makes sense, but we'd need to figure out how to update the cost dictionary first. Let's see if LliteLLM is willing to make the change, and if not, we could potentially add a sidecar dictionary that we merge prices with in the meantime

areibman · 2024-06-17T20:00:12Z

@lukestanley LiteLLM just merged some new changes that make this easier: BerriAI/litellm#4229 (comment)

I don't have a ton of capacity this week, but happy to merge if you raise a PR (looks like you wrote 90% of the code anyway). Otherwise I'll get to it when I get to it

lukestanley · 2024-06-17T21:17:18Z

The upstream change is interesting, but makes me a bit concerned about the scalability of the the a potential multitude of token count specific variables to check to find the applicable price rule and I wonder about the complexity needed for that.
I'm curious what LiteLLM's cost calculations are like! I'll try and check it out tomorrow. If they have a cost estimation function, if they have a compatible license, I wonder if copying it directly in a somewhat automated way might make more sense.
Anyhow it's late here but I'll try and look into it tomorrow.

punkpeye · 2024-09-24T23:46:04Z

Looks like upstream changes have been merged to litellm.

lukestanley · 2024-09-28T19:46:54Z

Unfortunately I don't have any capacity to work on this right now, but of course if any of the code is useful in anyway feel free to use it.

areibman mentioned this issue Jun 16, 2024

[Feature]: Length-based pricing for Google models BerriAI/litellm#4229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Length specific pricing bands for gemini-1.5-flash-latest #53

Length specific pricing bands for gemini-1.5-flash-latest #53

lukestanley commented Jun 16, 2024 •

edited

Loading

areibman commented Jun 16, 2024

areibman commented Jun 17, 2024

lukestanley commented Jun 17, 2024

punkpeye commented Sep 24, 2024

lukestanley commented Sep 28, 2024

Length specific pricing bands for gemini-1.5-flash-latest #53

Length specific pricing bands for gemini-1.5-flash-latest #53

Comments

lukestanley commented Jun 16, 2024 • edited Loading

areibman commented Jun 16, 2024

areibman commented Jun 17, 2024

lukestanley commented Jun 17, 2024

punkpeye commented Sep 24, 2024

lukestanley commented Sep 28, 2024

lukestanley commented Jun 16, 2024 •

edited

Loading