Add auto-discovery for regression test datasets with --include-local flag #2312

jucor · 2025-11-25T18:39:22Z

Summary

This PR adds support for testing with local datasets that are not committed to the repository. This enables:

Confidential data testing - Run regression tests on private conversations without risking accidental commits
Large dataset testing - Test with larger conversations that would bloat the repository
Easy dataset addition - Just drop data in real_data/.local/ and it's auto-discovered, no config changes needed

Changes

Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/
Add --include-local pytest flag to include git-ignored local datasets
Add .local/ to .gitignore for confidential/large datasets
Simplify datasets.py with DatasetInfo dataclass and discovery functions
Add conftest.py with pytest hooks for dynamic test parametrization
Update download_real_data.py to default to .local/ with --commit flag
Add unit tests for dataset discovery in test_datasets.py
Update tests/README.md with new documentation

Usage

# Default: run with committed datasets only
pytest tests/test_regression.py

# Include local datasets from real_data/.local/
pytest tests/test_regression.py --include-local

Test plan

Run pytest delphi/tests/test_datasets.py to verify dataset discovery
Run pytest delphi/tests/test_regression.py with committed datasets
Add a dataset to real_data/.local/ and verify it's discovered with --include-local
Verify .local/ directory is properly git-ignored

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR modernizes the regression testing infrastructure by implementing auto-discovery for test datasets and adding support for local, git-ignored datasets. The changes enable developers to test with confidential or large conversation data without risking accidental commits, while simplifying the process of adding new test datasets—just drop them in real_data/.local/ and they're automatically discovered.

Key changes:

Auto-discovery mechanism that scans directories matching pattern <report_id>-<name>/ in real_data/ and real_data/.local/
New --include-local pytest flag to opt-in to testing with local datasets
Refactored datasets.py with DatasetInfo dataclass and discovery functions replacing hardcoded configuration
Updated download script to default to .local/ directory with --commit flag for public datasets

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
delphi/polismath/regression/datasets.py	Core auto-discovery implementation with DatasetInfo dataclass and directory scanning logic
delphi/tests/conftest.py	pytest hooks for --include-local flag, dynamic test parametrization, and dataset summary reporting
delphi/tests/test_regression.py	Removed hardcoded parametrization in favor of dynamic discovery via conftest.py
delphi/tests/test_datasets.py	Unit tests for directory pattern matching, file checking, and dataset discovery logic
delphi/tests/download_real_data.py	New positional arguments (report_id, dataset_name) with --commit flag to control download location
delphi/tests/README.md	Updated documentation explaining auto-discovery, local datasets, and new download patterns
delphi/pyproject.toml	Added local_dataset marker for pytest
delphi/.gitignore	Added real_data/.local/ to git ignore list
delphi/polismath/regression/init.py	Updated exports to include new discovery functions and DatasetInfo class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

delphi/scripts/regression_download.py

delphi/tests/test_regression.py

delphi/polismath/regression/datasets.py

delphi/tests/conftest.py

delphi/tests/download_real_data.py

delphi/tests/test_datasets.py

delphi/tests/test_regression.py

jucor · 2025-11-25T18:52:42Z

All Copilot review comments have been addressed:

In commit 30edbff:

✅ Use any() instead of bool(list()) in _check_files for better performance
✅ Add multiple match validation in find_file function
✅ Fix pytest.skip() during collection - use empty parametrize list instead
✅ Add directory context comment to documentation
✅ Remove unused pytest import from test_datasets.py
✅ Remove unused list_regression_datasets import from test_regression.py
✅ Rename TestDirPattern to TestDirectoryPattern for clarity
✅ Improve error message for missing dataset_name argument

In commit 80115f9:

✅ Add warning when local datasets shadow committed datasets with same name
✅ Add test for include_local=True behavior

jucor · 2025-11-25T18:56:21Z

Note: rebased all commits to sign them -- hence why the commits appear after the review. Same content, just added signature.

…flag - Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/ - Add --include-local pytest flag to include git-ignored local datasets - Add .local/ to .gitignore for confidential/large datasets - Simplify datasets.py with DatasetInfo dataclass and discovery functions - Add conftest.py with pytest hooks for dynamic test parametrization - Update download_real_data.py to default to .local/ with --commit flag - Add unit tests for dataset discovery in test_datasets.py - Update tests/README.md with new documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Use any() instead of bool(list()) in _check_files for efficiency - Add multiple match validation in find_file - Fix pytest.skip() during collection (use empty parametrize instead) - Add directory context comment to test_regression.py usage - Remove unused list_regression_datasets import - Rename TestDirPattern to TestDirectoryPattern - Improve error message in regression_download.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Warn when local dataset shadows a committed dataset with same name - Add test for include_local=True behavior - Add test for name collision warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Allow datasets to be valid for regression testing without the Clojure math_blob file. This enables testing when database access is unavailable (e.g., when DATABASE_URL is not set). Changes: - DatasetInfo.is_valid now only requires votes, comments, and golden_snapshot - Added has_clojure_reference property to check if Clojure comparison is possible - Updated documentation to clarify math_blob is optional - Added tests for new behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- test_legacy_clojure_regression.py: Replace hardcoded ["biodiversity", "vw"] with auto-discovery using clojure_dataset fixture. Only includes datasets with has_clojure_reference=True (i.e., have math_blob for Clojure comparison). Respects --include-local flag. - regression_download.py: After download, check for missing golden_snapshot.json and offer to create them interactively. Shows command to create later if user declines. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Clojure comparison tests only need votes, comments, and math_blob. They compare against the Clojure output, not the Python golden snapshot. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (9)

delphi/scripts/regression_download.py:363

The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:17
The usage examples reference download_real_data.py but the script file is actually named regression_download.py. Update the references to use the correct filename to avoid confusion.
delphi/scripts/regression_download.py:31
The examples reference download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:355
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:359
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:367
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
delphi/scripts/regression_download.py:20
The usage example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename.
delphi/scripts/regression_download.py:26
The usage example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename.
delphi/scripts/regression_download.py:351
The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR compdemocracy#2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

jucor · 2025-11-26T13:45:51Z

Ready for human review and merge :-)

- _compute_vote_stats: Replace per-row/per-column loops with numpy vectorized operations using boolean masks and axis-based sums. This eliminates O(rows + cols) Python loops. - bench_update_votes.py: Make standalone by accepting CSV path directly instead of depending on datasets package. Add TODO for using datasets package once PR compdemocracy#2312 is merged. Combined with pivot_table optimization, achieves ~10x speedup on bg2050 dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

ballPointPenguin · 2025-12-01T23:43:02Z

Suggested Updates here: jucor#3

jucor requested a review from Copilot November 25, 2025 18:39

Copilot started reviewing on behalf of jucor November 25, 2025 18:39 View session

Copilot finished reviewing on behalf of jucor November 25, 2025 18:43

jucor force-pushed the local_datasets branch from 2667ae5 to f692f13 Compare November 25, 2025 18:43

Copilot AI reviewed Nov 25, 2025

View reviewed changes

jucor force-pushed the local_datasets branch from ca08d46 to 80115f9 Compare November 25, 2025 18:54

jucor requested a review from tevko November 25, 2025 18:56

jucor and others added 3 commits November 25, 2025 19:18

jucor force-pushed the local_datasets branch from 80115f9 to 143fe28 Compare November 25, 2025 19:18

jucor and others added 2 commits November 25, 2025 20:29

jucor force-pushed the local_datasets branch from 143fe28 to 6e0b5d7 Compare November 25, 2025 20:29

jucor force-pushed the local_datasets branch from 7ceb99c to 075797b Compare November 25, 2025 20:52

jucor requested a review from Copilot November 25, 2025 21:00

Copilot started reviewing on behalf of jucor November 25, 2025 21:01 View session

Copilot finished reviewing on behalf of jucor November 25, 2025 21:05

Copilot AI reviewed Nov 25, 2025

View reviewed changes

Fix name in examples as per GH Copilot review

7f38d94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add auto-discovery for regression test datasets with --include-local flag #2312

Add auto-discovery for regression test datasets with --include-local flag #2312

Uh oh!

jucor commented Nov 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jucor commented Nov 25, 2025 •

edited

Loading

Uh oh!

jucor commented Nov 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

jucor commented Nov 26, 2025

Uh oh!

ballPointPenguin commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add auto-discovery for regression test datasets with --include-local flag #2312

Are you sure you want to change the base?

Add auto-discovery for regression test datasets with --include-local flag #2312

Uh oh!

Conversation

jucor commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jucor commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jucor commented Nov 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jucor commented Nov 26, 2025

Uh oh!

ballPointPenguin commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Nov 25, 2025 •

edited

Loading

jucor commented Nov 25, 2025 •

edited

Loading