feat: Setup QA evaluation infrastructure and logging system #206

fg-nava · 2025-02-06T18:53:01Z

Ticket

https://navalabs.atlassian.net/browse/DST-759

Changes

Added new metrics module for QA evaluation:

New files:
- app/src/metrics/README.md - Module documentation
- app/src/metrics/cli.py - Evaluation CLI interface
- app/src/metrics/evaluation/ - Core evaluation modules
- app/src/metrics/models/ - Data models
- app/src/metrics/utils/ - Helper functions
Modified:
- app/Makefile - Added run-evaluation target
- .gitignore - Added patterns for logs and data

Context for reviewers

This PR implements the basic QA evaluation infrastructure as outlined in the tech spec. The system provides:

Structured logging of evaluation runs
CLI interface for running evaluations
Core metrics computation uses recall@k
Configurable parameters for evaluation runs

Testing

Run an evaluation using:

make run-evaluation dataset=imagine_la k=5,10,25

Additional parameters:

make run-evaluation \
  dataset=imagine_la \
  k=5,10,25 \
  questions_file=path/to/questions.csv \
  sampling=0.1 \
  min_score=-1.0

Results are stored in app/src/metrics/logs/YYYY-MM-DD/ with:

batch_${UUID}.json - Run metadata
results_${UUID}.jsonl - Individual results
metrics_${UUID}.json - Aggregated metrics

github-actions · 2025-02-06T18:58:17Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
3303	3007	91%	80%	🟢

New Files

File	Coverage	Status
app/src/metrics/_init_.py	100%	🟢
app/src/metrics/cli.py	94%	🟢
app/src/metrics/evaluation/_init_.py	100%	🟢
app/src/metrics/evaluation/batch.py	100%	🟢
app/src/metrics/evaluation/logging.py	95%	🟢
app/src/metrics/evaluation/metric_computation.py	100%	🟢
app/src/metrics/evaluation/results.py	100%	🟢
app/src/metrics/evaluation/runner.py	98%	🟢
app/src/metrics/models/_init_.py	100%	🟢
app/src/metrics/models/metrics.py	100%	🟢
app/src/metrics/utils/_init_.py	100%	🟢
app/src/metrics/utils/jsonl_to_csv.py	100%	🟢
app/src/metrics/utils/timer.py	100%	🟢
TOTAL	99%	🟢

Modified Files

No covered modified files...

updated for commit: 8642e91 by action🐍

app/Makefile

…_config

…ve naming

…e embedding params

yoomlam

Round 1 review done.

app/Makefile

.gitignore

app/src/metrics/README.md

… instructions

…ields

…ieval function parameters

Co-authored-by: Yoom Lam <[email protected]>

yoomlam

Submitting part A of round 2 review so you can start modifications

app/Makefile

app/src/metrics/README.md

app/src/metrics/evaluation/batch.py

app/tests/src/metrics/evaluation/test_logging.py

app/src/metrics/evaluation/metric_computation.py

yoomlam

Done round 2 review

app/src/metrics/evaluation/results.py

app/src/metrics/models/metrics.py

app/tests/src/metrics/evaluation/test_logging.py

app/src/metrics/evaluation/logging.py

app/tests/src/metrics/evaluation/test_runner.py

app/tests/src/metrics/test_cli.py

yoomlam

Round 3 review

app/src/metrics/utils/timer.py

app/src/metrics/utils/__init__.py

app/src/metrics/models/__init__.py

app/src/metrics/evaluation/metric_computation.py

app/src/metrics/evaluation/results.py

app/src/metrics/README.md

app/src/metrics/evaluation/runner.py

app/src/metrics/tech-spec.md

Co-authored-by: Yoom Lam <[email protected]>

yoomlam

Approving without further review

fg-nava marked this pull request as ready for review February 12, 2025 00:38

fg-nava force-pushed the DST-759-QA-evaluation-logging branch from accec49 to f4c0f5a Compare February 12, 2025 00:51

yoomlam reviewed Feb 12, 2025

View reviewed changes

app/Makefile Outdated Show resolved Hide resolved

yoomlam reviewed Feb 12, 2025

View reviewed changes

app/Makefile Outdated Show resolved Hide resolved

fg-nava added 20 commits February 12, 2025 08:36

docs: Drafting tech spec for modeling retrieval evals

bd3e005

docs: edits from review by Dianna

dc57bb9

feat: Setup QA evaluation infrastructure and logging system

40e2ef1

feat: add jsonl_to_csv utility for converting evaluation results

731e814

refactor: remove embedding.py as we now use retrieval scores from app…

1ad5099

…_config

refactor: update metrics evaluation to use retrieval scores and impro…

1fec0ff

…ve naming

feat: add CSV conversion of results in evaluation logger

362b5ef

docs: update metrics README to reflect current functionality

71bf4a7

build: update run-evaluation make target to use commit hash and remov…

370f8b5

…e embedding params

refactor: add case-insensitive filtering and improve logging

6eb3627

refactor: reorganize metrics module structure

39afca0

test: add metrics tests for evaluation module

6c8927c

chore: remove old metrics.py after rename

3347cf0

feat: update cli.py to support new metrics structure

253d0f2

test: Add tests for metrics evaluation results and runner modules

382a99c

test: Add tests for metrics CLI and improve error handling

03d915c

style: Clean up imports in metrics test files

888ced3

fix: Mark MD5 hash as not used for security in metrics evaluation

fde328b

test: Add tests for timer utility

55907fb

refactor: move DB debugging commands to Makefile.local

9d61207

fg-nava force-pushed the DST-759-QA-evaluation-logging branch from 76fd5ed to 9d61207 Compare February 12, 2025 16:36

yoomlam requested changes Feb 12, 2025

View reviewed changes

fg-nava added 3 commits February 12, 2025 11:20

refactor: use PY_RUN_CMD in run-evaluation target for consistency

b59fdd3

docs: add log storage location documentation

c7c3f35

docs: improve CLI argument descriptions and add default QA file setup…

9d9ae3f

… instructions

fg-nava and others added 7 commits February 12, 2025 13:25

nit: rename document_info to expected_chunk

510a80b

fix: CSV output to include expected content hash and flatten nested f…

41bc7ef

…ields

fix: restore git commit hash tracking in evaluation logs and fix retr…

955af28

…ieval function parameters

test: improve batch.py coverage with commit and package version tests

3f91640

Apply suggestions from code review

21c3344

Co-authored-by: Yoom Lam <[email protected]>

docs: multiple README updates from code review feedback

cf8a484

fix: lint error in test_batch.py

703bf11

fg-nava requested a review from yoomlam February 13, 2025 00:17

yoomlam requested changes Feb 13, 2025

View reviewed changes

yoomlam reviewed Feb 13, 2025

View reviewed changes

app/src/metrics/README.md Outdated Show resolved Hide resolved

yoomlam reviewed Feb 13, 2025

View reviewed changes

app/src/metrics/evaluation/runner.py Show resolved Hide resolved

yoomlam reviewed Feb 13, 2025

View reviewed changes

app/src/metrics/tech-spec.md Outdated Show resolved Hide resolved

fg-nava and others added 6 commits February 14, 2025 08:28

Merge branch 'main' into DST-759-QA-evaluation-logging

1b6977a

Apply documentation suggestions from code review

7e5ec5c

Co-authored-by: Yoom Lam <[email protected]>

docs: remove legacy details from docs

9fcd62a

Apply suggestions from code review

b782f60

Co-authored-by: Yoom Lam <[email protected]>

Merge branch 'main' into DST-759-QA-evaluation-logging

3620885

fix: raise error when software info is not available

803d145

yoomlam requested a review from KevinJBoyer February 18, 2025 18:01

fg-nava added 4 commits February 18, 2025 11:07

Merge branch 'main' into DST-759-QA-evaluation-logging

731e89a

fix: change doc_info to expected_chunk

41c922d

refactor: switch to dataclasses.asdict() for serialization

11c12d1

refactor: combine nested context managers in test files

42ee7ec

yoomlam approved these changes Feb 18, 2025

View reviewed changes

fg-nava added 2 commits February 18, 2025 13:02

docs: more clarity in tech-spec on scope of retrievals evalutaion

73d0552

refactor: descriptive name for timer

8642e91

fg-nava merged commit 85fc1a6 into main Feb 18, 2025
3 checks passed

fg-nava deleted the DST-759-QA-evaluation-logging branch February 18, 2025 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Setup QA evaluation infrastructure and logging system #206

feat: Setup QA evaluation infrastructure and logging system #206

fg-nava commented Feb 6, 2025 •

edited

Loading

github-actions bot commented Feb 6, 2025 •

edited

Loading

yoomlam left a comment

yoomlam left a comment

yoomlam left a comment

yoomlam left a comment

yoomlam left a comment

feat: Setup QA evaluation infrastructure and logging system #206

feat: Setup QA evaluation infrastructure and logging system #206

Conversation

fg-nava commented Feb 6, 2025 • edited Loading

Ticket

Changes

Context for reviewers

Testing

github-actions bot commented Feb 6, 2025 • edited Loading

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

yoomlam left a comment

Choose a reason for hiding this comment

yoomlam left a comment

Choose a reason for hiding this comment

yoomlam left a comment

Choose a reason for hiding this comment

yoomlam left a comment

Choose a reason for hiding this comment

yoomlam left a comment

Choose a reason for hiding this comment

fg-nava commented Feb 6, 2025 •

edited

Loading

github-actions bot commented Feb 6, 2025 •

edited

Loading