fix: critical issues - resource leak, security vulnerability, and tok… #6

VonEquinox · 2025-10-19T10:53:11Z

…en counting accuracy

Fix HTTP client resource leak by implementing FastAPI lifespan manager
Fix security vulnerability in authorization header parsing (case-insensitive)
Fix incorrect timestamps using file modification time instead of current time
Fix silent JSON parsing failures with proper logging
Fix token counting to include tool_calls and system prompts
Add comprehensive test script for token counting validation

Resolves:

Memory leaks in long-running deployments
Potential authorization bypass vulnerability
Inaccurate token usage reporting (20-50% underestimation)
Debugging difficulties due to silent failures

…en counting accuracy - Fix HTTP client resource leak by implementing FastAPI lifespan manager - Fix security vulnerability in authorization header parsing (case-insensitive) - Fix incorrect timestamps using file modification time instead of current time - Fix silent JSON parsing failures with proper logging - Fix token counting to include tool_calls and system prompts - Add comprehensive test script for token counting validation Resolves: - Memory leaks in long-running deployments - Potential authorization bypass vulnerability - Inaccurate token usage reporting (20-50% underestimation) - Debugging difficulties due to silent failures

Copilot

Pull Request Overview

Fixes multiple production issues: HTTP client lifecycle leaks, case-insensitive Authorization handling, timestamp correctness, more robust JSON coercion, and significantly improved token counting (including tool_calls). Also adds a script to validate token counting.

Implement FastAPI lifespan to initialize/close a shared httpx.AsyncClient
Harden Authorization header parsing and improve token counting for tool_calls and system prompts
Replace file mtime timestamps with current time; improve JSON coercion logs; add token counting test script

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
test_token_counting.py	Adds a script to exercise token counting across scenarios and models.
main.py	Introduces lifespan-managed http client, strengthens Authorization parsing, improves token counting (tool_calls and timing), fixes timestamps, and enhances JSON coercion logging.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-19T11:19:12Z

main.py

+            # Count content tokens
+            content_tokens = token_counter.count_text_tokens(completion_text, body.model) if completion_text else 0
+
+            # Count tool_calls tokens if present
+            tool_calls_tokens = 0
+            if completion_tool_calls:
+                # Create a temporary message to count tool_calls tokens
+                temp_msg = {"role": "assistant", "tool_calls": completion_tool_calls}
+                # Count just the tool_calls part
+                tool_calls_tokens = token_counter.count_tokens([temp_msg], body.model)
+                # Subtract the message overhead to get just tool_calls tokens
+                tool_calls_tokens -= 3  # Remove assistant priming overhead
+                logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls")
+
+            estimated_completion_tokens = content_tokens + tool_calls_tokens


tool_calls token isolation subtracts only the reply priming (+3) but not the per-message overhead (3 for most models, 4 for gpt-3.5-turbo/gpt-35-turbo). This overestimates tool_calls tokens by 3–4 tokens. Subtract both the per-message overhead and the reply priming, e.g.: compute msg_overhead = 4 if model startswith(('gpt-3.5-turbo','gpt-35-turbo')) else 3, then do tool_calls_tokens -= (msg_overhead + 3). Even better, add a TokenCounter helper to return these overheads to avoid duplicating model-specific logic.

Copilot · 2025-10-19T11:19:12Z

main.py

+                tool_calls_tokens -= 3  # Remove assistant priming overhead
+                logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream")


Same issue as non-stream path: only the reply priming (+3) is removed, but the per-message overhead isn't. This inflates tool_calls token counts by 3–4 tokens. Subtract both the per-message overhead (3 for most models, 4 for gpt-3.5-turbo/gpt-35-turbo) and the +3 reply priming.

Suggested change

tool_calls_tokens -= 3 # Remove assistant priming overhead

logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream")

# Subtract both per-message overhead and reply priming (+3)

# Most models: per-message overhead is 3, gpt-3.5-turbo/gpt-35-turbo: 4

per_message_overhead = 4 if body.model in ("gpt-3.5-turbo", "gpt-35-turbo") else 3

tool_calls_tokens -= (per_message_overhead + 3)

logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream (subtracted {per_message_overhead}+3 overhead)")

Copilot · 2025-10-19T11:19:12Z

test_token_counting.py

+import os
+
+# Add parent directory to path
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))


The comment says 'Add parent directory', but the code adds the script's directory. Either update the comment or actually add the parent with os.path.dirname(os.path.dirname(os.path.abspath(file))).

Suggested change

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))

sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

Copilot · 2025-10-19T11:19:13Z

main.py

+            # Count content tokens
+            content_tokens = token_counter.count_text_tokens(completion_text, body.model) if completion_text else 0
+
+            # Count tool_calls tokens if present
+            tool_calls_tokens = 0
+            if completion_tool_calls:
+                # Create a temporary message to count tool_calls tokens
+                temp_msg = {"role": "assistant", "tool_calls": completion_tool_calls}
+                # Count just the tool_calls part
+                tool_calls_tokens = token_counter.count_tokens([temp_msg], body.model)
+                # Subtract the message overhead to get just tool_calls tokens
+                tool_calls_tokens -= 3  # Remove assistant priming overhead
+                logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls")
+
+            estimated_completion_tokens = content_tokens + tool_calls_tokens


The logic to estimate completion tokens (content + tool_calls with model-specific overhead corrections) is duplicated here and in the streaming path. Consider extracting a single helper (e.g., TokenCounter.count_tool_calls_tokens(tool_calls, model) and a shared estimate_completion_tokens(content, tool_calls, model)) to centralize the model-specific overhead handling and avoid divergence.

Copilot · 2025-10-19T11:19:13Z

main.py

+    # Check authorization header format (case-insensitive)
+    if not authorization.lower().startswith("bearer "):
+        raise HTTPException(status_code=401, detail="Invalid authorization header format")
+
+    # Extract the token (everything after "Bearer ")
+    client_key = authorization[7:].strip()
+
+    # Ensure the key is not empty
+    if not client_key:
+        raise HTTPException(status_code=401, detail="Missing API key")


Parsing Authorization by slicing index 7 assumes a single space and no extra whitespace. Use robust parsing to handle multiple spaces/tabs: scheme, _, token = authorization.partition(' '); if scheme.lower() != 'bearer' or not token.strip(): raise HTTPException(...). This avoids edge cases and keeps the check case-insensitive.

funnycups · 2025-10-19T12:40:37Z

Thank you for your contribution! After a careful review I would suggest following changes:

Remove unnecessary test_token_counting.py file.
Counting tokens based on the actual prompt and the model’s generated XML/text instructions is more accurate. This approach is already implemented in the codebase, so the estimation logic for tool calls can now be removed.
I saw your contribution to unify some mixed usage of bytes and str. However there's still a mix of str and bytes in the streaming SSE output. I would suggest use bytes only in all SSE yields.

Thanks again for your great work on Toolify!

funnycups requested a review from Copilot October 19, 2025 11:16

Copilot AI reviewed Oct 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: critical issues - resource leak, security vulnerability, and tok… #6

fix: critical issues - resource leak, security vulnerability, and tok… #6

Uh oh!

VonEquinox commented Oct 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 19, 2025

Uh oh!

Copilot AI Oct 19, 2025

Uh oh!

Copilot AI Oct 19, 2025

Uh oh!

Copilot AI Oct 19, 2025

Uh oh!

Copilot AI Oct 19, 2025

Uh oh!

funnycups commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		tool_calls_tokens -= 3 # Remove assistant priming overhead
		logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream")

-                tool_calls_tokens -= 3  # Remove assistant priming overhead
-                logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream")
+                # Subtract both per-message overhead and reply priming (+3)
+                # Most models: per-message overhead is 3, gpt-3.5-turbo/gpt-35-turbo: 4
+                per_message_overhead = 4 if body.model in ("gpt-3.5-turbo", "gpt-35-turbo") else 3
+                tool_calls_tokens -= (per_message_overhead + 3)
+                logger.debug(f"🔧 Counted {tool_calls_tokens} tokens for {len(completion_tool_calls)} tool calls in stream (subtracted {per_message_overhead}+3 overhead)")

	sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
	sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

fix: critical issues - resource leak, security vulnerability, and tok… #6

Are you sure you want to change the base?

fix: critical issues - resource leak, security vulnerability, and tok… #6

Uh oh!

Conversation

VonEquinox commented Oct 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

funnycups commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants