-
Notifications
You must be signed in to change notification settings - Fork 652
added support for MAX_INPUT_TOKENS #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @VishalYadavCF, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request addresses issue #59 by adding support for MAX_INPUT_TOKENS
. The primary change involves adjusting the MAX_INPUT_TOKENS
variable in api/rag.py
to a lower value (3400) and passing this value to the embedder's model_kwargs
during initialization. This ensures that the configured token limit is respected when using the embedding model.
Highlights
- Token Limit Adjustment: The
MAX_INPUT_TOKENS
variable inapi/rag.py
has been adjusted from 7500 to 3400 to align with the embedding model's 4096 token limit. - Embedder Configuration: The
MAX_INPUT_TOKENS
value is now passed to the embedder'smodel_kwargs
during initialization, ensuring the model respects the configured token limit.
Changelog
- api/rag.py
- Adjusted
MAX_INPUT_TOKENS
from 7500 to 3400. - Passed
MAX_INPUT_TOKENS
to the embedder'smodel_kwargs
.
- Adjusted
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A token's count,
Must stay within the bound,
Or errors arise.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces support for MAX_INPUT_TOKENS
to limit the number of tokens used by the embedding model. This is a good change as it allows for better control over resource usage and can prevent errors due to exceeding token limits. The changes are straightforward and seem well-implemented.
Merge Readiness
The code changes are well-implemented and address the issue of token limits for embedding models. I don't see any critical or high severity issues. I am unable to approve the pull request, and recommend that others review and approve this code before merging.
@@ -45,7 +45,7 @@ def append_dialog_turn(self, dialog_turn): | |||
logger = logging.getLogger(__name__) | |||
|
|||
# Maximum token limit for embedding models | |||
MAX_INPUT_TOKENS = 7500 # Safe threshold below 8192 token limit | |||
MAX_INPUT_TOKENS = 3400 # Adjusted for embedding model with 4096 token limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comment explaining why the token limit was adjusted to 3400. This will help future developers understand the reasoning behind this specific value.
Style Guide References
MAX_INPUT_TOKENS = 3400 # Adjusted for embedding model with 4096 token limit | |
MAX_INPUT_TOKENS = 3400 # Adjusted for embedding model with 4096 token limit. Testing indicates best performance at this level. |
model_kwargs={ | ||
**embedder_config["model_kwargs"], | ||
"max_input_tokens": MAX_INPUT_TOKENS, # Use the configured token limit | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good that you're passing the MAX_INPUT_TOKENS
to the embedder. Consider adding a brief comment explaining why this is important, such as preventing exceeding the model's token limit.
Style Guide References
model_kwargs={ | |
**embedder_config["model_kwargs"], | |
"max_input_tokens": MAX_INPUT_TOKENS, # Use the configured token limit | |
}, | |
model_kwargs={ | |
**embedder_config["model_kwargs"], | |
"max_input_tokens": MAX_INPUT_TOKENS, # Prevents exceeding model's token limit | |
}, |
Should we make this more dynamically define from .env? I removed it initially because i am seeing weird results especially switching from different models. |
shouldnt they max tokens be specified by provider / model as they can be different per model ? |
Issue #59