Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total Tokens & Total Cost by Model was not being output correctly… #53

Closed
wants to merge 1 commit into from

Conversation

daihiraoka
Copy link

1. Issue Summary

  1. Total Tokens by Model was not being output correctly.
  2. Total Cost by Model was not being output.
  3. AI system names were being identified as "n/a" instead of their correct names.
  4. IntervalTokens was consistently reported as 0 in token usage collection and reporting.

2. Investigation Results

  1. The AI system name inference logic was inadequate, always outputting "n/a".
  2. The MetricsCollectorService class was not correctly processing token usage metrics (gen_ai.client.token.usage).
  3. The token aggregation logic in the LLMDc class's collectData method was problematic.
  4. The token count and cost calculation logic was incomplete.

3. Interim Solutions Implemented

3.1 MetricsCollectorService Class Changes

  1. Improved AI system inference logic:

    private String inferAiSystem(String metricName, String aiSystem) {
        if (aiSystem == null || aiSystem.isEmpty() || aiSystem.equals("n/a")) {
            if (metricName.startsWith("gen_ai.")) {
                return "openai";
            } else if (metricName.startsWith("llm.")) {
                String[] parts = metricName.split("\\.", 3);
                if (parts.length > 1) {
                    return parts[1];
                }
            }
            return "unknown";
        }
        return aiSystem;
    }
  2. Enhanced token processing in processSumMetric method:

    if (!modelId.isEmpty()) {
        OtelMetric otelMetric = new OtelMetric();
        otelMetric.setModelId(modelId);
        otelMetric.setAiSystem(aiSystem);
        if (tokenType.compareTo("prompt") == 0 || tokenType.compareTo("input") == 0) {
            otelMetric.setPromptTokens(tokens);
        } else if (tokenType.compareTo("completion") == 0 || tokenType.compareTo("output") == 0) {
            otelMetric.setCompleteTokens(tokens);
        } else {
            otelMetric.setPromptTokens(tokens);
            otelMetric.setCompleteTokens(tokens);
        }
        exportMetrics.add(otelMetric);
    }
  3. Improved logging for better debugging and traceability.

3.2 LLMDc Class Changes

  1. Enhanced config.yaml settings logging:

    private void logLLMSpecificConfig(Map<String, Object> properties) {
        logger.info("LLM Specific Configuration:");
        logger.info("  OPENAI_PRICE_PROMPT_TOKES_PER_KILO: " + 
                    properties.getOrDefault(OPENAI_PRICE_PROMPT_TOKES_PER_KILO, "Not set"));
        logger.info("  OPENAI_PRICE_COMPLETE_TOKES_PER_KILO: " + 
                    properties.getOrDefault(OPENAI_PRICE_COMPLETE_TOKES_PER_KILO, "Not set"));
        // Log other relevant configuration values
    }
  2. Improved token and cost calculation in collectData method:

    double intervalPromptTokens = (double)deltaPromptTokens/intervalSeconds;
    double intervalCompleteTokens = (double)deltaCompleteTokens/intervalSeconds;
    double intervalTotalTokens = intervalPromptTokens + intervalCompleteTokens;
    double intervalPromptCost = (intervalPromptTokens/1000) * pricePromptTokens;
    double intervalCompleteCost = (intervalCompleteTokens/1000) * priceCompleteTokens;
    double intervalTotalCost = intervalPromptCost + intervalCompleteCost;
  3. Added separate methods for getting prompt and complete token prices:

    private double getPricePromptTokens(String aiSystem) {
        switch (aiSystem) {
            case "watsonx": return watsonxPricePromptTokens;
            case "openai": return openaiPricePromptTokens;
            case "anthropic": return anthropicPricePromptTokens;
            default: return 0.0;
        }
    }
    
    private double getPriceCompleteTokens(String aiSystem) {
        switch (aiSystem) {
            case "watsonx": return watsonxPriceCompleteTokens;
            case "openai": return openaiPriceCompleteTokens;
            case "anthropic": return anthropicPriceCompleteTokens;
            default: return 0.0;
        }
    }

4. Results After Modifications

  • IntervalTokens is now correctly reported with non-zero values.
  • IntervalCost is calculated and reported (though values are very small).
  • AI system names are correctly identified and reported, no longer showing as "n/a".
  • Different AI systems and models are correctly identified and reported.

5. Remaining Issues and Recommendations

  1. Total Cost by Model Display: UI improvements needed to handle very small values.
    openai.price.prompt.tokens.per.kilo: 0.0005 # 1M tokens = $0.50、---> 1k tokens = $0.0005
    openai.price.complete.tokens.per.kilo: 0.0015 # 1M tokens = $1.50. ---> 1k tokens = $0.0015

  2. Total Cost by Model Display: UI improvements needed to handle very small values.

  3. Token Count and Cost Calculation Precision: May need adjustment.

  4. AI System Name Inference: Current logic prioritizes OpenAI; should be expanded to equally handle WatsonX and Anthropic.

  5. Multiple Price Inputs for Different Models: Support for entering different prices for different models, such as GPT-4 and GPT-3.5, to account for pricing variations.

These interim modifications have addressed the major issues.

Note:
These changes were implemented by a non-Java expert and are interim solutions.
The accuracy and completeness of the solutions are not guaranteed, and further review and testing by Java experts are recommended.

Contact: [email protected]

…. 2. Total Cost by Model was not being output. 3. AI system names were being identified as "n/a" instead of their correct names. 4. IntervalTokens was consistently reported as 0 in token usage collection and reporting. Results After Modifications
1. IntervalTokens is now correctly reported with non-zero values.
2. IntervalCost is calculated and reported (though values are very small).
3. AI system names are correctly identified and reported, no longer showing as "n/a".
4. Different AI systems and models are correctly identified and reported.
5. All config.yaml settings, including pricing, are logged at startup for verification. # Please enter the commit message for your changes. Lines starting with '#' will be ignored, and an empty message aborts # the commit.
#
# On branch main Your branch is up to date with 'origin/main'.
#
# Changes to be committed: modified: llm/impl/llm/LLMDc.java modified: llm/impl/llm/MetricsCollectorService.java
#
# Changes not staged for commit: modified: ../../../../../../config/config.yaml
#
@liurui-1 liurui-1 requested review from jinsongo and liyanwei93 July 15, 2024 04:44
@jinsongo
Copy link
Collaborator

jinsongo commented Jul 15, 2024

@daihiraoka The DC need to work with traceloop 0.18.2, I think that's why you encountered many problems. Please reference the document: https://www.ibm.com/docs/en/instana-observability/current?topic=technologies-monitoring-llms,

If we need to support traceloop 0.18.2 above, we must sync with the following PR of traceloop, such as directly to get the AI provider name from gen_ai.system, and to use the unified metrics name beginning with gen_ai, to use histogram metric type for tokens, etc.

@daihiraoka
Copy link
Author

@jinsongo
Regarding the previous issue, as you pointed out, the problem was due to the version of traceloop I was using. I was using a version above 0.18.2, which caused the issues. By switching to version 0.18.2, I was able to get it working properly.

I apologize for any inconvenience caused.

Thanks!

@jinsongo
Copy link
Collaborator

@daihiraoka As you can see, Traceloop is in a phase of rapid development and iteration, with even some critical aspects undergoing changes. This is why we have had to bind to a specific version of Traceloop for a certain period. Some parts of the current code might seem confusing because we are aware that Traceloop is making improvements, preventing us from having a stable implementation. Later, we plan to update and optimize our monitoring code based on a relatively stable version of Traceloop.
Thank you for your contribution. I will reference the improvements and suggestions from you for next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants