Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Setup QA evaluation infrastructure and logging system #206

Merged
merged 46 commits into from
Feb 18, 2025

Conversation

fg-nava
Copy link
Contributor

@fg-nava fg-nava commented Feb 6, 2025

Ticket

https://navalabs.atlassian.net/browse/DST-759

Changes

Added new metrics module for QA evaluation:

  • New files:

    • app/src/metrics/README.md - Module documentation
    • app/src/metrics/cli.py - Evaluation CLI interface
    • app/src/metrics/evaluation/ - Core evaluation modules
    • app/src/metrics/models/ - Data models
    • app/src/metrics/utils/ - Helper functions
  • Modified:

    • app/Makefile - Added run-evaluation target
    • .gitignore - Added patterns for logs and data

Context for reviewers

This PR implements the basic QA evaluation infrastructure as outlined in the tech spec. The system provides:

  • Structured logging of evaluation runs
  • CLI interface for running evaluations
  • Core metrics computation uses recall@k
  • Configurable parameters for evaluation runs

Testing

Run an evaluation using:

make run-evaluation dataset=imagine_la k=5,10,25

Additional parameters:

make run-evaluation \
  dataset=imagine_la \
  k=5,10,25 \
  questions_file=path/to/questions.csv \
  sampling=0.1 \
  min_score=-1.0

Results are stored in app/src/metrics/logs/YYYY-MM-DD/ with:

  • batch_${UUID}.json - Run metadata
  • results_${UUID}.jsonl - Individual results
  • metrics_${UUID}.json - Aggregated metrics

Copy link

github-actions bot commented Feb 6, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
3303 3007 91% 80% 🟢

New Files

File Coverage Status
app/src/metrics/_init_.py 100% 🟢
app/src/metrics/cli.py 94% 🟢
app/src/metrics/evaluation/_init_.py 100% 🟢
app/src/metrics/evaluation/batch.py 100% 🟢
app/src/metrics/evaluation/logging.py 95% 🟢
app/src/metrics/evaluation/metric_computation.py 100% 🟢
app/src/metrics/evaluation/results.py 100% 🟢
app/src/metrics/evaluation/runner.py 98% 🟢
app/src/metrics/models/_init_.py 100% 🟢
app/src/metrics/models/metrics.py 100% 🟢
app/src/metrics/utils/_init_.py 100% 🟢
app/src/metrics/utils/jsonl_to_csv.py 100% 🟢
app/src/metrics/utils/timer.py 100% 🟢
TOTAL 99% 🟢

Modified Files

No covered modified files...

updated for commit: 8642e91 by action🐍

@fg-nava fg-nava marked this pull request as ready for review February 12, 2025 00:38
@fg-nava fg-nava force-pushed the DST-759-QA-evaluation-logging branch from accec49 to f4c0f5a Compare February 12, 2025 00:51
app/Makefile Outdated Show resolved Hide resolved
app/Makefile Outdated Show resolved Hide resolved
@fg-nava fg-nava force-pushed the DST-759-QA-evaluation-logging branch from 76fd5ed to 9d61207 Compare February 12, 2025 16:36
Copy link
Contributor

@yoomlam yoomlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 1 review done.

app/Makefile Show resolved Hide resolved
app/Makefile Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
@fg-nava fg-nava requested a review from yoomlam February 13, 2025 00:17
Copy link
Contributor

@yoomlam yoomlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting part A of round 2 review so you can start modifications

app/Makefile Outdated Show resolved Hide resolved
app/Makefile Outdated Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/README.md Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
app/src/metrics/evaluation/batch.py Show resolved Hide resolved
Copy link
Contributor

@yoomlam yoomlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done round 2 review

app/src/metrics/evaluation/results.py Show resolved Hide resolved
app/src/metrics/evaluation/results.py Outdated Show resolved Hide resolved
app/src/metrics/evaluation/results.py Outdated Show resolved Hide resolved
app/src/metrics/evaluation/results.py Outdated Show resolved Hide resolved
app/src/metrics/models/metrics.py Outdated Show resolved Hide resolved
app/src/metrics/evaluation/logging.py Outdated Show resolved Hide resolved
app/tests/src/metrics/evaluation/test_runner.py Outdated Show resolved Hide resolved
app/tests/src/metrics/test_cli.py Outdated Show resolved Hide resolved
app/tests/src/metrics/test_cli.py Outdated Show resolved Hide resolved
Copy link
Contributor

@yoomlam yoomlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 review

app/src/metrics/utils/timer.py Outdated Show resolved Hide resolved
app/src/metrics/utils/__init__.py Outdated Show resolved Hide resolved
app/src/metrics/models/__init__.py Outdated Show resolved Hide resolved
app/src/metrics/evaluation/results.py Show resolved Hide resolved
app/src/metrics/evaluation/results.py Show resolved Hide resolved
app/src/metrics/README.md Outdated Show resolved Hide resolved
@yoomlam yoomlam requested a review from KevinJBoyer February 18, 2025 18:01
Copy link
Contributor

@yoomlam yoomlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving without further review

@fg-nava fg-nava merged commit 85fc1a6 into main Feb 18, 2025
3 checks passed
@fg-nava fg-nava deleted the DST-759-QA-evaluation-logging branch February 18, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants