Skip to content

Conversation

@b3nw
Copy link
Contributor

@b3nw b3nw commented Jan 20, 2026

Adds native NanoGPT provider support with subscription-based quota tracking, enabling the TUI to display daily and monthly usage limits.

Features

  • Subscription Quota Tracking: Fetches and displays both daily (2,000) and monthly (60,000) request limits from the NanoGPT subscription API
  • Dynamic Model Discovery: Discovers available models from /api/v1/models endpoint with fallback to static list
  • Tier-Based Priority: Prioritizes credentials by subscription state (active > grace > none)
  • Background Quota Refresh: Syncs quota data every 5 minutes via background job
  • TUI Integration: Displays quota status with reset times in the quota viewer

Technical Details

  • Uses NanoGptQuotaTracker mixin for subscription API integration
  • Implements per_model usage tracking mode with virtual _daily and _monthly models
  • Adds API key prefix detection (sk-nano-*) in UsageManager._get_provider_from_credential
  • Adds nanogpt to _REQUEST_COUNT_PROVIDERS for proper request counting

Important

Adds NanoGPT provider with subscription-based quota tracking, dynamic model discovery, and TUI integration for usage limits.

  • Behavior:
    • Adds NanoGptProvider class in nanogpt_provider.py for NanoGPT API with subscription-based usage tracking.
    • Implements dynamic model discovery from /api/v1/models and fallback to static list.
    • Prioritizes credentials by subscription state (active > grace > none).
    • Integrates with TUI to display daily and monthly usage limits.
  • Quota Tracking:
    • Introduces NanoGptQuotaTracker mixin in nanogpt_quota_tracker.py for handling subscription API integration.
    • Tracks daily and monthly usage limits with background refresh every 5 minutes.
    • Adds API key prefix detection (sk-nano-*) in UsageManager._get_provider_from_credential.
    • Adds nanogpt to _REQUEST_COUNT_PROVIDERS in usage_manager.py for request counting.
  • Misc:
    • Updates UsageManager to support new provider and quota tracking logic.

This description was created by Ellipsis for 222d902. You can customize this summary. It will automatically update as commits are pushed.

Implements NanoGPT provider featuring:
- Subscription-based usage tracking with daily/monthly limits via API
- Dynamic model discovery from NanoGPT /api/v1/models endpoint
- Tier-based priority (subscription-active > subscription-grace > no-subscription)
- Background job refreshing quota every 5 minutes
- Quota groups for daily (2000) and monthly (60000) limits display in TUI

Technical details:
- Uses NanoGptQuotaTracker mixin for subscription API integration
- Implements get_usage_reset_config() for per_model tracking mode
- Adds API key prefix detection (sk-nano-*) in UsageManager
- Adds NanoGPT to _REQUEST_COUNT_PROVIDERS for proper request counting
@b3nw b3nw requested a review from Mirrowel as a code owner January 20, 2026 03:53
@mirrobot-agent
Copy link
Contributor

I'm starting my review of the NanoGPT provider implementation and subscription quota tracking. Diving into the code now!

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

The addition of the NanoGPT provider and the subscription quota tracking is a significant improvement for the TUI's awareness of daily/monthly limits. The background job integration and dynamic model discovery are well-implemented. However, there are a few architectural issues regarding how these quotas are enforced and reset within the UsageManager framework.

Architectural Feedback

  1. Quota Enforcement: Currently, real models (like gpt-4o) are not in the same quota group as the _daily and _monthly trackers. This means the UsageManager will not block these models even when the subscription limit is reached.
  2. Usage Resets: The use of field_name="daily" in per_model mode creates a mismatch in the UsageManager, which will likely break the usage reset logic.

Key Suggestions

  • Link Real Models to Groups: Ensure that all models for the NanoGPT provider belong to a shared quota group (e.g., "daily") so that the background quota updates correctly affect selection for all models.
  • Fix field_name Mismatch: Switch to the default "models" field name or ensure the UsageManager logic matches your custom field.
  • Cleanup Config: Remove the redundant and contradictory usage_reset_configs class attribute.

Nitpicks and Minor Points

  • Update the fallback model list to use current Gemini versions (1.5/2.0).
  • Consider passing the semaphore to fetch_subscription_usage for tighter concurrency control.

Questions for the Author

  • Does NanoGPT support multiple concurrent quota groups for a single model (Daily and Monthly)? If not, we might need to sync the most restrictive limit to a single group.

This review was generated by an AI assistant.

Comment on lines 143 to 168
def get_model_quota_group(self, model: str) -> Optional[str]:
"""
Get the quota group for a model.
NanoGPT tracks quota via virtual models:
- _daily: belongs to "daily" quota group
- _monthly: belongs to "monthly" quota group
- Other models: no group (tracked individually)
Args:
model: Model name
Returns:
Quota group name or None if not grouped
"""
# Strip provider prefix if present
clean_model = model.split("/")[-1] if "/" in model else model

if clean_model == "_daily":
return "daily"
elif clean_model == "_monthly":
return "monthly"

# Real models are not grouped - they're tracked individually
# Their usage doesn't sync with the virtual quota models
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical issue: Real models (like gpt-4o) return None for their quota group, while only the virtual _daily and _monthly models are grouped. This means that when the background job marks _daily as exhausted, real models will not be blocked because they don’t share a group with it. All NanoGPT models should be linked to the same quota pool to ensure exhaustion is respected.

Comment on lines 120 to 137
def get_usage_reset_config(self, credential: str) -> Optional[Dict[str, Any]]:
"""
Return usage reset configuration for NanoGPT credentials.
NanoGPT uses per_model mode to track usage at the model level,
with daily and monthly quotas managed via the background job.
Args:
credential: The API key (unused, same config for all)
Returns:
Configuration with per_model mode
"""
return {
"mode": "per_model",
"window_seconds": 86400, # 24 hours (daily quota reset)
"field_name": "daily",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a configuration mismatch here. You are setting field_name="daily" in per_model mode. However, UsageManager.record_success hardcodes the storage key to "models" when in per_model mode, while the reset logic (_check_window_reset) uses the field_name from this config. This will cause usage to be recorded in one place but reset in another, effectively preventing resets. Recommend using the default "models" field name for per_model mode.

Comment on lines 92 to 99
usage_reset_configs = {
"default": UsageResetConfigDef(
window_seconds=24 * 60 * 60, # 24 hours
mode="credential", # All models share daily quota
description="Daily subscription quota (UTC midnight reset)",
field_name="daily",
),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage_reset_configs class attribute is currently dead code because it is completely overridden by the get_usage_reset_config method at line 126. It is also contradictory (using mode="credential" here vs mode="per_model" below). Recommend removing this or unifying it.

"gpt-4o-mini",
"claude-3.5-sonnet",
"claude-3.5-haiku",
"gemini-2.5-flash",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these model names correct? gemini-2.5-flash and gemini-2.5-pro are not currently valid models (Gemini is at 1.5/2.0). Also, check if NanoGPT expects the google/ prefix.

Comment on lines +106 to +109
response = await new_client.get(url, headers=headers)
response.raise_for_status()
data = response.json()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creating a new httpx.AsyncClient here, it is not subject to the QUOTA_FETCH_CONCURRENCY semaphore defined in the provider. While the current background job caller handles concurrency, it might be safer to pass the semaphore or an existing client into this method to avoid connection spikes if called from elsewhere.

- Link all real models to 'daily' quota group for exhaustion enforcement
- Keep _monthly in separate group to prevent cross-syncing of limits
- Remove field_name='daily' mismatch from get_usage_reset_config
- Remove dead usage_reset_configs class attribute
- Pass shared httpx client to quota refresh for concurrency control
@Mirrowel
Copy link
Owner

/mirrobot-review

@Mirrowel Mirrowel added the Agent Monitored Monitored for AI Agent to review PR's and commits label Jan 20, 2026
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

Thanks for the quick updates! You've addressed several of the key concerns from the previous review, particularly the httpx.AsyncClient concurrency and the field_name mismatch in the usage tracking config.

However, I've identified a significant issue with the quota synchronization logic that will prevent real models from being properly blocked or updated based on the subscription API data.

Assessment of New Changes:

  • Fixed: httpx.AsyncClient is now shared and respects concurrency limits.
  • Fixed: get_usage_reset_config now uses the default storage key, aligning it with the UsageManager reset logic.
  • Fixed: Real models are now assigned to the "daily" quota group.
  • Remaining Issue: The implementation of get_models_in_quota_group currently breaks the synchronization mechanism for real models. See the inline comment for details.
  • Remaining Issue: Fallback Gemini model names are still incorrect.

Overall Status:
Changes are requested due to the synchronization blocker. Once that's addressed, this will be in great shape.

This review was generated by an AI assistant.

"gpt-4o-mini",
"claude-3.5-sonnet",
"claude-3.5-haiku",
"gemini-2.5-flash",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-flagging this from the previous review: gemini-2.5-flash and gemini-2.5-pro are not valid model names (Gemini is currently at 1.5/2.0). These should likely be updated to reflect current models.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are valid models, this is incorrect.

Comment on lines 153 to 171
def get_models_in_quota_group(self, group: str) -> List[str]:
"""
Get the canonical model(s) for quota group stats display.
This returns only the virtual quota tracker models for display purposes.
All real models share these quota pools via get_model_quota_group().
Args:
group: Quota group identifier
Returns:
List with the virtual quota tracker model
"""
if group == "daily":
# Only _daily for stats display - it has the baseline from API
return ["_daily"]
elif group == "monthly":
return ["_monthly"]
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation of get_models_in_quota_group only returns the virtual tracker models (_daily, _monthly). This is a blocker for the synchronization logic in UsageManager.update_quota_baseline.

The UsageManager relies on this method to find all models belonging to a group so it can sync the request_count, baseline, and cooldowns across them. By only returning the virtual models here, real models like gpt-4o will never have their local usage counts or cooldowns synced with the authoritative data fetched from the subscription API.

You should return all models belonging to the group, potentially using self._discovered_models for the "daily" group.

return "monthly"

# All other models (real models + _daily) share the daily quota
return "daily"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, real models are only linked to the "daily" quota group. If a user hits their monthly subscription limit, these models will not be blocked by the UsageManager because they aren't part of the "monthly" group.

Since UsageManager only supports one group per model, consider either:

  1. Linking them to the most restrictive group.
  2. In update_quota_baseline for _daily, syncing the most restrictive remaining fraction between daily and monthly.


# Quota groups for tracking daily and monthly limits
# These are virtual models used to track subscription-level quota
model_quota_groups = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model_quota_groups class attribute appears to be redundant now that you have explicitly implemented get_models_in_quota_group and get_model_quota_group. It can be removed to avoid confusion.

- Add _subscription_models set to track quota-limited models
- Fetch subscription-only models from /api/subscription/v1/models
- Real models belong to 'monthly' group (hard limit, blocks queries)
- _daily belongs to 'daily' group (soft limit, display only)
- Only monthly group includes subscription models for sync
Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

This update makes great progress! I've verified that the quota synchronization logic is now correctly linking real models to the "monthly" quota group, which was a major blocker in the previous turn.

However, I've identified one remaining logic issue regarding the daily limit enforcement that could lead to unnecessary request failures.

Assessment of New Changes:

  • Fixed: get_models_in_quota_group now includes real subscription models in the "monthly" group, allowing proper baseline synchronization.
  • Improved: Clear distinction between daily and monthly limits in the provider logic.
  • Remaining Issue: Real models are currently only grouped under "monthly". Since the monthly limit (60,000) is much larger than the daily limit (2,000), a credential that has exhausted its daily quota will still show a very high remaining fraction for the monthly group (~96%). This prevents the UsageManager from proactively blocking the key, leading to wasted requests until a reactive cooldown is triggered.
  • Cleanup: The redundant model_quota_groups attribute should be removed.

Overall Status:
Changes are requested to address the daily limit synchronization for real models. Once the more restrictive fraction is used for the baseline, this will be in excellent shape.

This review was generated by an AI assistant.


# Quota groups for tracking daily and monthly limits
# These are virtual models used to track subscription-level quota
model_quota_groups = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model_quota_groups class attribute is now redundant and inconsistent with the dynamic logic in get_models_in_quota_group. It should be removed to avoid confusion.

# Real models + _monthly belong to monthly (hard limit)
return "monthly"

def get_models_in_quota_group(self, group: str) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job updating this to include self._subscription_models! This correctly addresses the synchronization blocker identified in the previous review, ensuring that real models now receive quota baselines from the background job.

max_requests=daily_limit,
reset_timestamp=daily_reset_ts if daily_reset_ts > 0 else None,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential gap in how the daily limit is enforced. Currently, real models only track the "monthly" group.

On a fresh subscription, if the 2,000 daily limit is exhausted, the monthly fraction will still be ~96% (58,000/60,000). The UsageManager will continue to select this key for real models, leading to failed requests at the API level.

I recommend using the most restrictive fraction for the monthly group baseline to ensure the proxy proactively blocks the key when the daily limit is hit:

Suggested change
# Store monthly quota baseline (synced to real models)
# Use most restrictive fraction to ensure blocking on daily limit
effective_fraction = min(daily_fraction, monthly_fraction)
await usage_manager.update_quota_baseline(
api_key,
"nanogpt/_monthly",
effective_fraction,
max_requests=monthly_limit,
reset_timestamp=monthly_reset_ts if monthly_reset_ts > 0 else None,
)

@b3nw
Copy link
Contributor Author

b3nw commented Jan 20, 2026

@mirrobot-agent[bot]

Daily vs Monthly Quota Behavior
The current design intentionally groups real models only under "monthly" because of how NanoGPT's quota system works:

Daily limit (2,000) = Soft limit - NanoGPT does NOT reject requests when this is exhausted. Subscribers can continue making requests; the limit is informational.
Monthly limit (60,000) = Hard limit - NanoGPT actively rejects requests when this is exceeded.
If we proactively blocked on daily exhaustion (0/2000 remaining), we would incorrectly deny valid requests that NanoGPT would still accept. The "wasted requests" scenario you described doesn't apply because NanoGPT continues to serve requests after daily exhaustion.

This is why real models are grouped with _monthly - we only want to proactively block when the hard limit is approaching.

model_quota_groups Attribute
This attribute isn't redundant - it's required by
_get_effective_quota_groups()
in
ProviderInterface
for TUI quota group discovery. Without it, the base class returns an empty dict and quota groups won't display in the TUI.

The
get_models_in_quota_group()
method provides the actual models at runtime (including dynamically discovered subscription models), while model_quota_groups provides the static group names for discovery.

@mirrobot-agent
Copy link
Contributor

@b3nw, thanks for the clarifications on the NanoGPT quota behavior and the model_quota_groups requirement!

I'm re-evaluating the implementation with this context in mind. I'll check the ProviderInterface usage and ensure the daily/monthly distinction is handled as you've described. I'll update my review shortly.

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

Thanks for the detailed explanation! I've reviewed your points regarding the soft vs. hard limits and the TUI discovery requirements for model_quota_groups.

Assessment of Response:

  • Daily vs. Monthly: I accept that the daily limit is a soft limit and should not block requests. Your design correctly groups real models only under the "monthly" hard limit to avoid incorrect denials.
  • Quota Discovery: I verified that client.py uses _get_effective_quota_groups() (which relies on model_quota_groups) for TUI discovery, so that attribute is indeed necessary.
  • Model Names: While the Gemini model names (gemini-2.5-*) seemed unusual, I'll defer to the maintainer's confirmation that they are valid in this context.

The implementation is now well-aligned with both the NanoGPT API behavior and the proxy's architectural requirements.

This review was generated by an AI assistant.

@mirrobot-agent
Copy link
Contributor

@b3nw, I've completed my follow-up review.

Based on your clarifications and my verification of the codebase, I've approved the pull request. The design for soft daily limits and the use of model_quota_groups for TUI discovery is correct and well-implemented.

Great job on this provider integration! 🚀

@Mirrowel Mirrowel merged commit 470c67c into Mirrowel:dev Jan 21, 2026
2 of 3 checks passed
b3nw pushed a commit to b3nw/LLM-API-Key-Proxy that referenced this pull request Jan 21, 2026
feat(nanogpt): add NanoGPT provider with subscription quota tracking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agent Monitored Monitored for AI Agent to review PR's and commits

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: nano-gpt provider naming conflicts with litellm integration - only custom non-standard naming works

2 participants