Use a different minimal case#80526
Conversation
case-001 is the "hard" case Opus always fails at, case-003 is a better smoke test to make sure things are working.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughThe ChangesCI Config Update
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[REHEARSALNOTIFIER]
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals. Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@stbenjam: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/pj-rehearse ack |
|
@stbenjam: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller, stbenjam The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
case-001 is the "hard" case Opus always fails at, case-003 is a better smoke test to make sure things are working.
Summary by CodeRabbit
This PR updates the OpenShift CI configuration for the
openshift-eng/ai-helpersrepository to improve the reliability of its evaluation smoke test.Change: The
eval-payload-analysis-minimaltest'sEVAL_CASESenvironment variable is changed fromcase-001tocase-003. This minimal test is a fast-feedback smoke test that runs evaluation workloads with reduced scope (250 max turns instead of 2500) to quickly verify the system is functioning.Rationale: The PR replaces
case-001, which is identified as a hard test case that consistently fails when evaluated with Claude Opus, withcase-003, which serves as a more reliable baseline for verifying system functionality. This change ensures the smoke test doesn't get blocked on a known problematic test case and can provide useful early feedback during CI runs.