Mikec/ai grader #78

mikecann · 2025-08-27T02:12:36Z

This PR is based upon this one:
#77

As mentioned in the above, there is only so far that we can do with unit-tests based grading. There are lots of places where the task involves something that its not directly testable.

For example 001/008 asks the AI to use "helper functions" that it then calls from its queries. This is not testable via unit testing.

So that's what this PR is about, it allows us to test those tasks by introducing "AI Grading".

It feeds a model the task and the generated output and asks it to give a "pass / fail" and a couple of sentences to explain its reasoning.

This works very well in all my testing thus far.

I have it set to use gpt-5-mini for now to keep costs low but it could use any model.

You invoke it simple as a unit test in a grader.test.ts file so if it fails then the unit test fails. It logs the reasoning.

I then asked an AI to go through all the tasks and work out which are not covered entirely by the grader tests and to add the AI based grading too. I asked it to give reasoning during this and I smoke tested a few and it seems logical.

mikecann · 2025-10-21T06:27:18Z

closing in favour of #85 which build on smaller chunks

mikecann added 8 commits August 25, 2025 15:14

wip

1067b9f

works

a8f240e

removed dead comments

45fd6fc

added AI grading to the tasks that need it

dddd669

Merge branch 'mikec/upgrade_graders' into mikec/ai_grader

ab27548

Merge branch 'mikec/upgrade_graders' into mikec/ai_grader

3da3b65

fixed

ef9dc60

simplified grading by removing boiler and updated readme

e552763

mikecann requested a review from jordanhunt22 August 27, 2025 02:23

reventing some log lines

7587177

mikecann mentioned this pull request Oct 21, 2025

Mikec/05 ai grader #85

Open

mikecann closed this Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mikec/ai grader #78

Mikec/ai grader #78

Uh oh!

mikecann commented Aug 27, 2025

Uh oh!

mikecann commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mikec/ai grader #78

Mikec/ai grader #78

Uh oh!

Conversation

mikecann commented Aug 27, 2025

Uh oh!

mikecann commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant