-
-
Notifications
You must be signed in to change notification settings - Fork 55
feat(nanogpt): add NanoGPT provider with subscription quota tracking #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements NanoGPT provider featuring: - Subscription-based usage tracking with daily/monthly limits via API - Dynamic model discovery from NanoGPT /api/v1/models endpoint - Tier-based priority (subscription-active > subscription-grace > no-subscription) - Background job refreshing quota every 5 minutes - Quota groups for daily (2000) and monthly (60000) limits display in TUI Technical details: - Uses NanoGptQuotaTracker mixin for subscription API integration - Implements get_usage_reset_config() for per_model tracking mode - Adds API key prefix detection (sk-nano-*) in UsageManager - Adds NanoGPT to _REQUEST_COUNT_PROVIDERS for proper request counting
|
I'm starting my review of the NanoGPT provider implementation and subscription quota tracking. Diving into the code now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall Assessment
The addition of the NanoGPT provider and the subscription quota tracking is a significant improvement for the TUI's awareness of daily/monthly limits. The background job integration and dynamic model discovery are well-implemented. However, there are a few architectural issues regarding how these quotas are enforced and reset within the UsageManager framework.
Architectural Feedback
- Quota Enforcement: Currently, real models (like
gpt-4o) are not in the same quota group as the_dailyand_monthlytrackers. This means theUsageManagerwill not block these models even when the subscription limit is reached. - Usage Resets: The use of
field_name="daily"inper_modelmode creates a mismatch in theUsageManager, which will likely break the usage reset logic.
Key Suggestions
- Link Real Models to Groups: Ensure that all models for the NanoGPT provider belong to a shared quota group (e.g.,
"daily") so that the background quota updates correctly affect selection for all models. - Fix field_name Mismatch: Switch to the default
"models"field name or ensure theUsageManagerlogic matches your custom field. - Cleanup Config: Remove the redundant and contradictory
usage_reset_configsclass attribute.
Nitpicks and Minor Points
- Update the fallback model list to use current Gemini versions (1.5/2.0).
- Consider passing the semaphore to
fetch_subscription_usagefor tighter concurrency control.
Questions for the Author
- Does NanoGPT support multiple concurrent quota groups for a single model (Daily and Monthly)? If not, we might need to sync the most restrictive limit to a single group.
This review was generated by an AI assistant.
| def get_model_quota_group(self, model: str) -> Optional[str]: | ||
| """ | ||
| Get the quota group for a model. | ||
| NanoGPT tracks quota via virtual models: | ||
| - _daily: belongs to "daily" quota group | ||
| - _monthly: belongs to "monthly" quota group | ||
| - Other models: no group (tracked individually) | ||
| Args: | ||
| model: Model name | ||
| Returns: | ||
| Quota group name or None if not grouped | ||
| """ | ||
| # Strip provider prefix if present | ||
| clean_model = model.split("/")[-1] if "/" in model else model | ||
|
|
||
| if clean_model == "_daily": | ||
| return "daily" | ||
| elif clean_model == "_monthly": | ||
| return "monthly" | ||
|
|
||
| # Real models are not grouped - they're tracked individually | ||
| # Their usage doesn't sync with the virtual quota models | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical issue: Real models (like gpt-4o) return None for their quota group, while only the virtual _daily and _monthly models are grouped. This means that when the background job marks _daily as exhausted, real models will not be blocked because they don’t share a group with it. All NanoGPT models should be linked to the same quota pool to ensure exhaustion is respected.
| def get_usage_reset_config(self, credential: str) -> Optional[Dict[str, Any]]: | ||
| """ | ||
| Return usage reset configuration for NanoGPT credentials. | ||
| NanoGPT uses per_model mode to track usage at the model level, | ||
| with daily and monthly quotas managed via the background job. | ||
| Args: | ||
| credential: The API key (unused, same config for all) | ||
| Returns: | ||
| Configuration with per_model mode | ||
| """ | ||
| return { | ||
| "mode": "per_model", | ||
| "window_seconds": 86400, # 24 hours (daily quota reset) | ||
| "field_name": "daily", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a configuration mismatch here. You are setting field_name="daily" in per_model mode. However, UsageManager.record_success hardcodes the storage key to "models" when in per_model mode, while the reset logic (_check_window_reset) uses the field_name from this config. This will cause usage to be recorded in one place but reset in another, effectively preventing resets. Recommend using the default "models" field name for per_model mode.
| usage_reset_configs = { | ||
| "default": UsageResetConfigDef( | ||
| window_seconds=24 * 60 * 60, # 24 hours | ||
| mode="credential", # All models share daily quota | ||
| description="Daily subscription quota (UTC midnight reset)", | ||
| field_name="daily", | ||
| ), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This usage_reset_configs class attribute is currently dead code because it is completely overridden by the get_usage_reset_config method at line 126. It is also contradictory (using mode="credential" here vs mode="per_model" below). Recommend removing this or unifying it.
| "gpt-4o-mini", | ||
| "claude-3.5-sonnet", | ||
| "claude-3.5-haiku", | ||
| "gemini-2.5-flash", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these model names correct? gemini-2.5-flash and gemini-2.5-pro are not currently valid models (Gemini is at 1.5/2.0). Also, check if NanoGPT expects the google/ prefix.
| response = await new_client.get(url, headers=headers) | ||
| response.raise_for_status() | ||
| data = response.json() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating a new httpx.AsyncClient here, it is not subject to the QUOTA_FETCH_CONCURRENCY semaphore defined in the provider. While the current background job caller handles concurrency, it might be safer to pass the semaphore or an existing client into this method to avoid connection spikes if called from elsewhere.
- Link all real models to 'daily' quota group for exhaustion enforcement - Keep _monthly in separate group to prevent cross-syncing of limits - Remove field_name='daily' mismatch from get_usage_reset_config - Remove dead usage_reset_configs class attribute - Pass shared httpx client to quota refresh for concurrency control
|
/mirrobot-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up Review
Thanks for the quick updates! You've addressed several of the key concerns from the previous review, particularly the httpx.AsyncClient concurrency and the field_name mismatch in the usage tracking config.
However, I've identified a significant issue with the quota synchronization logic that will prevent real models from being properly blocked or updated based on the subscription API data.
Assessment of New Changes:
- Fixed:
httpx.AsyncClientis now shared and respects concurrency limits. - Fixed:
get_usage_reset_confignow uses the default storage key, aligning it with theUsageManagerreset logic. - Fixed: Real models are now assigned to the
"daily"quota group. - Remaining Issue: The implementation of
get_models_in_quota_groupcurrently breaks the synchronization mechanism for real models. See the inline comment for details. - Remaining Issue: Fallback Gemini model names are still incorrect.
Overall Status:
Changes are requested due to the synchronization blocker. Once that's addressed, this will be in great shape.
This review was generated by an AI assistant.
| "gpt-4o-mini", | ||
| "claude-3.5-sonnet", | ||
| "claude-3.5-haiku", | ||
| "gemini-2.5-flash", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-flagging this from the previous review: gemini-2.5-flash and gemini-2.5-pro are not valid model names (Gemini is currently at 1.5/2.0). These should likely be updated to reflect current models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are valid models, this is incorrect.
| def get_models_in_quota_group(self, group: str) -> List[str]: | ||
| """ | ||
| Get the canonical model(s) for quota group stats display. | ||
| This returns only the virtual quota tracker models for display purposes. | ||
| All real models share these quota pools via get_model_quota_group(). | ||
| Args: | ||
| group: Quota group identifier | ||
| Returns: | ||
| List with the virtual quota tracker model | ||
| """ | ||
| if group == "daily": | ||
| # Only _daily for stats display - it has the baseline from API | ||
| return ["_daily"] | ||
| elif group == "monthly": | ||
| return ["_monthly"] | ||
| return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation of get_models_in_quota_group only returns the virtual tracker models (_daily, _monthly). This is a blocker for the synchronization logic in UsageManager.update_quota_baseline.
The UsageManager relies on this method to find all models belonging to a group so it can sync the request_count, baseline, and cooldowns across them. By only returning the virtual models here, real models like gpt-4o will never have their local usage counts or cooldowns synced with the authoritative data fetched from the subscription API.
You should return all models belonging to the group, potentially using self._discovered_models for the "daily" group.
| return "monthly" | ||
|
|
||
| # All other models (real models + _daily) share the daily quota | ||
| return "daily" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, real models are only linked to the "daily" quota group. If a user hits their monthly subscription limit, these models will not be blocked by the UsageManager because they aren't part of the "monthly" group.
Since UsageManager only supports one group per model, consider either:
- Linking them to the most restrictive group.
- In
update_quota_baselinefor_daily, syncing the most restrictive remaining fraction between daily and monthly.
|
|
||
| # Quota groups for tracking daily and monthly limits | ||
| # These are virtual models used to track subscription-level quota | ||
| model_quota_groups = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model_quota_groups class attribute appears to be redundant now that you have explicitly implemented get_models_in_quota_group and get_model_quota_group. It can be removed to avoid confusion.
- Add _subscription_models set to track quota-limited models - Fetch subscription-only models from /api/subscription/v1/models - Real models belong to 'monthly' group (hard limit, blocks queries) - _daily belongs to 'daily' group (soft limit, display only) - Only monthly group includes subscription models for sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up Review
This update makes great progress! I've verified that the quota synchronization logic is now correctly linking real models to the "monthly" quota group, which was a major blocker in the previous turn.
However, I've identified one remaining logic issue regarding the daily limit enforcement that could lead to unnecessary request failures.
Assessment of New Changes:
- Fixed:
get_models_in_quota_groupnow includes real subscription models in the"monthly"group, allowing proper baseline synchronization. - Improved: Clear distinction between daily and monthly limits in the provider logic.
- Remaining Issue: Real models are currently only grouped under
"monthly". Since the monthly limit (60,000) is much larger than the daily limit (2,000), a credential that has exhausted its daily quota will still show a very high remaining fraction for the monthly group (~96%). This prevents theUsageManagerfrom proactively blocking the key, leading to wasted requests until a reactive cooldown is triggered. - Cleanup: The redundant
model_quota_groupsattribute should be removed.
Overall Status:
Changes are requested to address the daily limit synchronization for real models. Once the more restrictive fraction is used for the baseline, this will be in excellent shape.
This review was generated by an AI assistant.
|
|
||
| # Quota groups for tracking daily and monthly limits | ||
| # These are virtual models used to track subscription-level quota | ||
| model_quota_groups = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model_quota_groups class attribute is now redundant and inconsistent with the dynamic logic in get_models_in_quota_group. It should be removed to avoid confusion.
| # Real models + _monthly belong to monthly (hard limit) | ||
| return "monthly" | ||
|
|
||
| def get_models_in_quota_group(self, group: str) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job updating this to include self._subscription_models! This correctly addresses the synchronization blocker identified in the previous review, ensuring that real models now receive quota baselines from the background job.
| max_requests=daily_limit, | ||
| reset_timestamp=daily_reset_ts if daily_reset_ts > 0 else None, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential gap in how the daily limit is enforced. Currently, real models only track the "monthly" group.
On a fresh subscription, if the 2,000 daily limit is exhausted, the monthly fraction will still be ~96% (58,000/60,000). The UsageManager will continue to select this key for real models, leading to failed requests at the API level.
I recommend using the most restrictive fraction for the monthly group baseline to ensure the proxy proactively blocks the key when the daily limit is hit:
| # Store monthly quota baseline (synced to real models) | |
| # Use most restrictive fraction to ensure blocking on daily limit | |
| effective_fraction = min(daily_fraction, monthly_fraction) | |
| await usage_manager.update_quota_baseline( | |
| api_key, | |
| "nanogpt/_monthly", | |
| effective_fraction, | |
| max_requests=monthly_limit, | |
| reset_timestamp=monthly_reset_ts if monthly_reset_ts > 0 else None, | |
| ) |
|
@mirrobot-agent[bot] Daily vs Monthly Quota Behavior Daily limit (2,000) = Soft limit - NanoGPT does NOT reject requests when this is exhausted. Subscribers can continue making requests; the limit is informational. This is why real models are grouped with _monthly - we only want to proactively block when the hard limit is approaching. model_quota_groups Attribute The |
|
@b3nw, thanks for the clarifications on the NanoGPT quota behavior and the I'm re-evaluating the implementation with this context in mind. I'll check the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up Review
Thanks for the detailed explanation! I've reviewed your points regarding the soft vs. hard limits and the TUI discovery requirements for model_quota_groups.
Assessment of Response:
- Daily vs. Monthly: I accept that the daily limit is a soft limit and should not block requests. Your design correctly groups real models only under the "monthly" hard limit to avoid incorrect denials.
- Quota Discovery: I verified that
client.pyuses_get_effective_quota_groups()(which relies onmodel_quota_groups) for TUI discovery, so that attribute is indeed necessary. - Model Names: While the Gemini model names (
gemini-2.5-*) seemed unusual, I'll defer to the maintainer's confirmation that they are valid in this context.
The implementation is now well-aligned with both the NanoGPT API behavior and the proxy's architectural requirements.
This review was generated by an AI assistant.
|
@b3nw, I've completed my follow-up review. Based on your clarifications and my verification of the codebase, I've approved the pull request. The design for soft daily limits and the use of Great job on this provider integration! 🚀 |
feat(nanogpt): add NanoGPT provider with subscription quota tracking
Adds native NanoGPT provider support with subscription-based quota tracking, enabling the TUI to display daily and monthly usage limits.
Features
/api/v1/modelsendpoint with fallback to static listTechnical Details
NanoGptQuotaTrackermixin for subscription API integrationper_modelusage tracking mode with virtual_dailyand_monthlymodelssk-nano-*) inUsageManager._get_provider_from_credentialnanogptto_REQUEST_COUNT_PROVIDERSfor proper request countingImportant
Adds NanoGPT provider with subscription-based quota tracking, dynamic model discovery, and TUI integration for usage limits.
NanoGptProviderclass innanogpt_provider.pyfor NanoGPT API with subscription-based usage tracking./api/v1/modelsand fallback to static list.NanoGptQuotaTrackermixin innanogpt_quota_tracker.pyfor handling subscription API integration.sk-nano-*) inUsageManager._get_provider_from_credential.nanogptto_REQUEST_COUNT_PROVIDERSinusage_manager.pyfor request counting.UsageManagerto support new provider and quota tracking logic.This description was created by
for 222d902. You can customize this summary. It will automatically update as commits are pushed.