Skip to content

added support for MAX_INPUT_TOKENS #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

VishalYadavCF
Copy link

Issue #59

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @VishalYadavCF, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request addresses issue #59 by adding support for MAX_INPUT_TOKENS. The primary change involves adjusting the MAX_INPUT_TOKENS variable in api/rag.py to a lower value (3400) and passing this value to the embedder's model_kwargs during initialization. This ensures that the configured token limit is respected when using the embedding model.

Highlights

  • Token Limit Adjustment: The MAX_INPUT_TOKENS variable in api/rag.py has been adjusted from 7500 to 3400 to align with the embedding model's 4096 token limit.
  • Embedder Configuration: The MAX_INPUT_TOKENS value is now passed to the embedder's model_kwargs during initialization, ensuring the model respects the configured token limit.

Changelog

  • api/rag.py
    • Adjusted MAX_INPUT_TOKENS from 7500 to 3400.
    • Passed MAX_INPUT_TOKENS to the embedder's model_kwargs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A token's count,
Must stay within the bound,
Or errors arise.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces support for MAX_INPUT_TOKENS to limit the number of tokens used by the embedding model. This is a good change as it allows for better control over resource usage and can prevent errors due to exceeding token limits. The changes are straightforward and seem well-implemented.

Merge Readiness

The code changes are well-implemented and address the issue of token limits for embedding models. I don't see any critical or high severity issues. I am unable to approve the pull request, and recommend that others review and approve this code before merging.

@@ -45,7 +45,7 @@ def append_dialog_turn(self, dialog_turn):
logger = logging.getLogger(__name__)

# Maximum token limit for embedding models
MAX_INPUT_TOKENS = 7500 # Safe threshold below 8192 token limit
MAX_INPUT_TOKENS = 3400 # Adjusted for embedding model with 4096 token limit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a comment explaining why the token limit was adjusted to 3400. This will help future developers understand the reasoning behind this specific value.

Style Guide References

Suggested change
MAX_INPUT_TOKENS = 3400 # Adjusted for embedding model with 4096 token limit
MAX_INPUT_TOKENS = 3400 # Adjusted for embedding model with 4096 token limit. Testing indicates best performance at this level.

Comment on lines +235 to +238
model_kwargs={
**embedder_config["model_kwargs"],
"max_input_tokens": MAX_INPUT_TOKENS, # Use the configured token limit
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good that you're passing the MAX_INPUT_TOKENS to the embedder. Consider adding a brief comment explaining why this is important, such as preventing exceeding the model's token limit.

Style Guide References

Suggested change
model_kwargs={
**embedder_config["model_kwargs"],
"max_input_tokens": MAX_INPUT_TOKENS, # Use the configured token limit
},
model_kwargs={
**embedder_config["model_kwargs"],
"max_input_tokens": MAX_INPUT_TOKENS, # Prevents exceeding model's token limit
},

@sng-asyncfunc
Copy link
Contributor

Should we make this more dynamically define from .env? I removed it initially because i am seeing weird results especially switching from different models.

@lujmo11
Copy link

lujmo11 commented May 21, 2025

shouldnt they max tokens be specified by provider / model as they can be different per model ?
so maybe they should be set in api/config/embedder.json as specific model_kwargs .. and then rag.py will read them in ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants