-
Notifications
You must be signed in to change notification settings - Fork 26
fix: critical issues - resource leak, security vulnerability, and tok… #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: critical issues - resource leak, security vulnerability, and tok… #6
Conversation
…en counting accuracy - Fix HTTP client resource leak by implementing FastAPI lifespan manager - Fix security vulnerability in authorization header parsing (case-insensitive) - Fix incorrect timestamps using file modification time instead of current time - Fix silent JSON parsing failures with proper logging - Fix token counting to include tool_calls and system prompts - Add comprehensive test script for token counting validation Resolves: - Memory leaks in long-running deployments - Potential authorization bypass vulnerability - Inaccurate token usage reporting (20-50% underestimation) - Debugging difficulties due to silent failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Fixes multiple production issues: HTTP client lifecycle leaks, case-insensitive Authorization handling, timestamp correctness, more robust JSON coercion, and significantly improved token counting (including tool_calls). Also adds a script to validate token counting.
- Implement FastAPI lifespan to initialize/close a shared httpx.AsyncClient
- Harden Authorization header parsing and improve token counting for tool_calls and system prompts
- Replace file mtime timestamps with current time; improve JSON coercion logs; add token counting test script
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| test_token_counting.py | Adds a script to exercise token counting across scenarios and models. |
| main.py | Introduces lifespan-managed http client, strengthens Authorization parsing, improves token counting (tool_calls and timing), fixes timestamps, and enhances JSON coercion logging. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| # Count content tokens | ||
| content_tokens = token_counter.count_text_tokens(completion_text, body.model) if completion_text else 0 | ||
|
|
||
| # Count tool_calls tokens if present | ||
| tool_calls_tokens = 0 | ||
| if completion_tool_calls: | ||
| # Create a temporary message to count tool_calls tokens | ||
| temp_msg = {"role": "assistant", "tool_calls": completion_tool_calls} | ||
| # Count just the tool_calls part | ||
| tool_calls_tokens = token_counter.count_tokens([temp_msg], body.model) | ||
| # Subtract the message overhead to get just tool_calls tokens | ||
| tool_calls_tokens -= 3 # Remove assistant priming overhead | ||
| logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls") | ||
|
|
||
| estimated_completion_tokens = content_tokens + tool_calls_tokens |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tool_calls token isolation subtracts only the reply priming (+3) but not the per-message overhead (3 for most models, 4 for gpt-3.5-turbo/gpt-35-turbo). This overestimates tool_calls tokens by 3–4 tokens. Subtract both the per-message overhead and the reply priming, e.g.: compute msg_overhead = 4 if model startswith(('gpt-3.5-turbo','gpt-35-turbo')) else 3, then do tool_calls_tokens -= (msg_overhead + 3). Even better, add a TokenCounter helper to return these overheads to avoid duplicating model-specific logic.
| tool_calls_tokens -= 3 # Remove assistant priming overhead | ||
| logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream") |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as non-stream path: only the reply priming (+3) is removed, but the per-message overhead isn't. This inflates tool_calls token counts by 3–4 tokens. Subtract both the per-message overhead (3 for most models, 4 for gpt-3.5-turbo/gpt-35-turbo) and the +3 reply priming.
| tool_calls_tokens -= 3 # Remove assistant priming overhead | |
| logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream") | |
| # Subtract both per-message overhead and reply priming (+3) | |
| # Most models: per-message overhead is 3, gpt-3.5-turbo/gpt-35-turbo: 4 | |
| per_message_overhead = 4 if body.model in ("gpt-3.5-turbo", "gpt-35-turbo") else 3 | |
| tool_calls_tokens -= (per_message_overhead + 3) | |
| logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream (subtracted {per_message_overhead}+3 overhead)") |
| import os | ||
|
|
||
| # Add parent directory to path | ||
| sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says 'Add parent directory', but the code adds the script's directory. Either update the comment or actually add the parent with os.path.dirname(os.path.dirname(os.path.abspath(file))).
| sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) | |
| sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) |
| # Count content tokens | ||
| content_tokens = token_counter.count_text_tokens(completion_text, body.model) if completion_text else 0 | ||
|
|
||
| # Count tool_calls tokens if present | ||
| tool_calls_tokens = 0 | ||
| if completion_tool_calls: | ||
| # Create a temporary message to count tool_calls tokens | ||
| temp_msg = {"role": "assistant", "tool_calls": completion_tool_calls} | ||
| # Count just the tool_calls part | ||
| tool_calls_tokens = token_counter.count_tokens([temp_msg], body.model) | ||
| # Subtract the message overhead to get just tool_calls tokens | ||
| tool_calls_tokens -= 3 # Remove assistant priming overhead | ||
| logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls") | ||
|
|
||
| estimated_completion_tokens = content_tokens + tool_calls_tokens |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to estimate completion tokens (content + tool_calls with model-specific overhead corrections) is duplicated here and in the streaming path. Consider extracting a single helper (e.g., TokenCounter.count_tool_calls_tokens(tool_calls, model) and a shared estimate_completion_tokens(content, tool_calls, model)) to centralize the model-specific overhead handling and avoid divergence.
| # Check authorization header format (case-insensitive) | ||
| if not authorization.lower().startswith("bearer "): | ||
| raise HTTPException(status_code=401, detail="Invalid authorization header format") | ||
|
|
||
| # Extract the token (everything after "Bearer ") | ||
| client_key = authorization[7:].strip() | ||
|
|
||
| # Ensure the key is not empty | ||
| if not client_key: | ||
| raise HTTPException(status_code=401, detail="Missing API key") |
Copilot
AI
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing Authorization by slicing index 7 assumes a single space and no extra whitespace. Use robust parsing to handle multiple spaces/tabs: scheme, _, token = authorization.partition(' '); if scheme.lower() != 'bearer' or not token.strip(): raise HTTPException(...). This avoids edge cases and keeps the check case-insensitive.
|
Thank you for your contribution! After a careful review I would suggest following changes:
Thanks again for your great work on Toolify! |
…en counting accuracy
Resolves: