Skip to content

Conversation

mikecann
Copy link
Contributor

@mikecann mikecann commented Oct 21, 2025

This is part 2 of 5. The last PR was: #80

This PRs goal is to lay the groundwork for the bulk of the evals changes that is to come in the next PR.

This PR fixed up the running of tests because right now they don't actually run on main due to some sort of out of order issue. To be honest im not entirely sure what's going on, it could be Windows vs OSX vs Linux issue. This PR fixes them up so that the tests run.

Added some more helper functions for graders too use to make clearer unit tests.

It adds a few evals, so I can test that these changes. The rest of them will be in the next part.

This PR also adds much more context in _write_local_results into the output file local_results.jsonl. These changes are mainly used in the fourth PR (#84) but are also useful for LLMs to inspect the output of a given run when running locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant