Skip to content

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Oct 9, 2025

This PR adds benchmark results for the qwen/qwen3-vl-30b-a3b-thinking model.

The following files have been updated:

  • src/benchmark/results.json - Raw benchmark results
  • src/benchmark/validation-results.json - Validation results against human baseline

This PR was automatically generated by the benchmark workflow.

Note: If you don't want to merge this PR, close it and the model will be added to the untested list to prevent re-processing.

@alrocar

@vercel
Copy link

vercel bot commented Oct 9, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
llm-benchmark Ready Ready Preview Comment Oct 9, 2025 6:12pm

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on November 2

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

"details": "Results don't match",
"humanRowCount": 10,
"llmRowCount": 0,
"sql": "%\nSELECT repo_name, COUNT(*) AS stars\nFROM github_events\nWHERE event_type = 'WatchEvent'\n AND created_at >= {{DateTime(start_date)}} AND created_at < {{DateTime(end_date)}}\nGROUP BY repo_name\nORDER BY stars DESC\nLIMIT 10"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Invalid SQL Query Syntax and Template Errors

The SQL query for qwen/qwen3-vl-30b-a3b-thinking contains invalid syntax, starting with a '%' character and including unresolved {{DateTime(...)}} template variables. This suggests the query wasn't properly generated or processed, preventing its execution in the benchmark.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants