-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Setup QA evaluation infrastructure and logging system #206
Conversation
☂️ Python Coverage
Overall Coverage
New Files
Modified FilesNo covered modified files...
|
accec49
to
f4c0f5a
Compare
…e embedding params
76fd5ed
to
9d61207
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round 1 review done.
…ieval function parameters
Co-authored-by: Yoom Lam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting part A of round 2 review so you can start modifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done round 2 review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round 3 review
Co-authored-by: Yoom Lam <[email protected]>
Co-authored-by: Yoom Lam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving without further review
Ticket
https://navalabs.atlassian.net/browse/DST-759
Changes
Added new metrics module for QA evaluation:
New files:
app/src/metrics/README.md
- Module documentationapp/src/metrics/cli.py
- Evaluation CLI interfaceapp/src/metrics/evaluation/
- Core evaluation modulesapp/src/metrics/models/
- Data modelsapp/src/metrics/utils/
- Helper functionsModified:
app/Makefile
- Added run-evaluation target.gitignore
- Added patterns for logs and dataContext for reviewers
This PR implements the basic QA evaluation infrastructure as outlined in the tech spec. The system provides:
Testing
Run an evaluation using:
Additional parameters:
Results are stored in
app/src/metrics/logs/YYYY-MM-DD/
with:batch_${UUID}.json
- Run metadataresults_${UUID}.jsonl
- Individual resultsmetrics_${UUID}.json
- Aggregated metrics