-
Notifications
You must be signed in to change notification settings - Fork 4
Add benchmark results for qwen/qwen2.5-coder-7b-instruct #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add benchmark results for qwen/qwen2.5-coder-7b-instruct #123
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
"details": "Results don't match", | ||
"humanRowCount": 1, | ||
"llmRowCount": 0, | ||
"sql": "I previously asked: \"I previously asked: \"Count all stars\"\n\nYou generated this SQL query:\n\nCount all stars for a given repository\nSelect all stars for a given repository\nSelect all stars for a given repository and user\nSelect all stars for a given repository and user and date range\nSelect all stars for a given repository and user and date range and order by created_at\nSelect all stars for a given repository and user and date range and order by created_at and limit 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 1\n\n\nBut it resulted in this error:\n\n{\"error\": \"DB::Exception: Syntax error: failed at position 1 ('Count') (line 1, col 1): Count all stars for a given repository\\nSelect all stars for a given repository\\nSelect all stars for a given repository and user\\nSelect all stars for a given rep. Expected one of: Query, Query with output, EXPLAIN, EXPLAIN, SELECT query, possibly with UNION, list of union elements, SELECT query, subquery, possibly with UNION, SELECT subquery, SELECT query, WITH, FROM, SELECT, SHOW CREATE QUOTA query, SHOW C\n\n\nPlease fix the SQL query to correctly answer my original question. Make sure the SQL is valid for Tinybird/ClickHouse.\"\n\nYou generated this SQL query:\n\nI previously asked: \"Count all stars\"\n\nYou generated this SQL query:\n\nCount all stars for a given repository\nSelect all stars for a given repository\nSelect all stars for a given repository and user\nSelect all stars for a given repository and user and date range\nSelect all stars for a given repository and user and date range and order by created_at\nSelect all stars for a given repository and user and date range and order by created_at and limit 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars\nSelect all stars for a given repository and user and date range and order by created_at and limit 10 and offset 10 and group by actor_login and count(*) as stars and order by stars desc and limit 10 and offset 10 and sum(stars) as total_stars and sum(stars) as total_stars\nSelect all stars for a given repository and user and" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"qwen3-coder-plus", | ||
"qwen3-coder-flash" | ||
"qwen3-coder-flash", | ||
"qwen2.5-coder-7b-instruct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Model Name Prefix Mismatch
The qwen2.5-coder-7b-instruct
model was added to the config without the qwen/
prefix. This name differs from the qwen/qwen2.5-coder-7b-instruct
mentioned in the PR description and for benchmark results, which may prevent the system from correctly matching configurations.
This PR adds benchmark results for the qwen/qwen2.5-coder-7b-instruct model.
The following files have been updated:
src/benchmark/results.json
- Raw benchmark resultssrc/benchmark/validation-results.json
- Validation results against human baselineThis PR was automatically generated by the benchmark workflow.
Note: If you don't want to merge this PR, close it and the model will be added to the untested list to prevent re-processing.
@alrocar