Skip to content

Conversation

@jucor
Copy link
Collaborator

@jucor jucor commented Nov 25, 2025

Summary

This PR adds support for testing with local datasets that are not committed to the repository. This enables:

  1. Confidential data testing - Run regression tests on private conversations without risking accidental commits
  2. Large dataset testing - Test with larger conversations that would bloat the repository
  3. Easy dataset addition - Just drop data in real_data/.local/ and it's auto-discovered, no config changes needed

Changes

  • Auto-discover datasets from real_data/ and real_data/.local/ based on directory naming pattern <report_id>-<name>/
  • Add --include-local pytest flag to include git-ignored local datasets
  • Add .local/ to .gitignore for confidential/large datasets
  • Simplify datasets.py with DatasetInfo dataclass and discovery functions
  • Add conftest.py with pytest hooks for dynamic test parametrization
  • Update download_real_data.py to default to .local/ with --commit flag
  • Add unit tests for dataset discovery in test_datasets.py
  • Update tests/README.md with new documentation

Usage

# Default: run with committed datasets only
pytest tests/test_regression.py

# Include local datasets from real_data/.local/
pytest tests/test_regression.py --include-local

Test plan

  • Run pytest delphi/tests/test_datasets.py to verify dataset discovery
  • Run pytest delphi/tests/test_regression.py with committed datasets
  • Add a dataset to real_data/.local/ and verify it's discovered with --include-local
  • Verify .local/ directory is properly git-ignored

🤖 Generated with Claude Code

@jucor jucor requested a review from Copilot November 25, 2025 18:39
Copilot finished reviewing on behalf of jucor November 25, 2025 18:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes the regression testing infrastructure by implementing auto-discovery for test datasets and adding support for local, git-ignored datasets. The changes enable developers to test with confidential or large conversation data without risking accidental commits, while simplifying the process of adding new test datasets—just drop them in real_data/.local/ and they're automatically discovered.

Key changes:

  • Auto-discovery mechanism that scans directories matching pattern <report_id>-<name>/ in real_data/ and real_data/.local/
  • New --include-local pytest flag to opt-in to testing with local datasets
  • Refactored datasets.py with DatasetInfo dataclass and discovery functions replacing hardcoded configuration
  • Updated download script to default to .local/ directory with --commit flag for public datasets

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
delphi/polismath/regression/datasets.py Core auto-discovery implementation with DatasetInfo dataclass and directory scanning logic
delphi/tests/conftest.py pytest hooks for --include-local flag, dynamic test parametrization, and dataset summary reporting
delphi/tests/test_regression.py Removed hardcoded parametrization in favor of dynamic discovery via conftest.py
delphi/tests/test_datasets.py Unit tests for directory pattern matching, file checking, and dataset discovery logic
delphi/tests/download_real_data.py New positional arguments (report_id, dataset_name) with --commit flag to control download location
delphi/tests/README.md Updated documentation explaining auto-discovery, local datasets, and new download patterns
delphi/pyproject.toml Added local_dataset marker for pytest
delphi/.gitignore Added real_data/.local/ to git ignore list
delphi/polismath/regression/init.py Updated exports to include new discovery functions and DatasetInfo class

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jucor
Copy link
Collaborator Author

jucor commented Nov 25, 2025

All Copilot review comments have been addressed:

In commit 30edbff:

  • ✅ Use any() instead of bool(list()) in _check_files for better performance
  • ✅ Add multiple match validation in find_file function
  • ✅ Fix pytest.skip() during collection - use empty parametrize list instead
  • ✅ Add directory context comment to documentation
  • ✅ Remove unused pytest import from test_datasets.py
  • ✅ Remove unused list_regression_datasets import from test_regression.py
  • ✅ Rename TestDirPattern to TestDirectoryPattern for clarity
  • ✅ Improve error message for missing dataset_name argument

In commit 80115f9:

  • ✅ Add warning when local datasets shadow committed datasets with same name
  • ✅ Add test for include_local=True behavior

@jucor
Copy link
Collaborator Author

jucor commented Nov 25, 2025

Note: rebased all commits to sign them -- hence why the commits appear after the review. Same content, just added signature.

@jucor jucor requested a review from tevko November 25, 2025 18:56
jucor and others added 3 commits November 25, 2025 19:18
…flag

- Auto-discover datasets from real_data/ and real_data/.local/ based on
  directory naming pattern <report_id>-<name>/
- Add --include-local pytest flag to include git-ignored local datasets
- Add .local/ to .gitignore for confidential/large datasets
- Simplify datasets.py with DatasetInfo dataclass and discovery functions
- Add conftest.py with pytest hooks for dynamic test parametrization
- Update download_real_data.py to default to .local/ with --commit flag
- Add unit tests for dataset discovery in test_datasets.py
- Update tests/README.md with new documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Use any() instead of bool(list()) in _check_files for efficiency
- Add multiple match validation in find_file
- Fix pytest.skip() during collection (use empty parametrize instead)
- Add directory context comment to test_regression.py usage
- Remove unused list_regression_datasets import
- Rename TestDirPattern to TestDirectoryPattern
- Improve error message in regression_download.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Warn when local dataset shadows a committed dataset with same name
- Add test for include_local=True behavior
- Add test for name collision warning

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
jucor and others added 2 commits November 25, 2025 20:29
Allow datasets to be valid for regression testing without the Clojure
math_blob file. This enables testing when database access is unavailable
(e.g., when DATABASE_URL is not set).

Changes:
- DatasetInfo.is_valid now only requires votes, comments, and golden_snapshot
- Added has_clojure_reference property to check if Clojure comparison is possible
- Updated documentation to clarify math_blob is optional
- Added tests for new behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- test_legacy_clojure_regression.py: Replace hardcoded ["biodiversity", "vw"]
  with auto-discovery using clojure_dataset fixture. Only includes datasets
  with has_clojure_reference=True (i.e., have math_blob for Clojure comparison).
  Respects --include-local flag.

- regression_download.py: After download, check for missing golden_snapshot.json
  and offer to create them interactively. Shows command to create later if
  user declines.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Clojure comparison tests only need votes, comments, and math_blob.
They compare against the Clojure output, not the Python golden snapshot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jucor jucor requested a review from Copilot November 25, 2025 21:00
Copilot finished reviewing on behalf of jucor November 25, 2025 21:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (9)

delphi/scripts/regression_download.py:363

  • The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
    delphi/scripts/regression_download.py:17
  • The usage examples reference download_real_data.py but the script file is actually named regression_download.py. Update the references to use the correct filename to avoid confusion.
    delphi/scripts/regression_download.py:31
  • The examples reference download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
    delphi/scripts/regression_download.py:355
  • The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
    delphi/scripts/regression_download.py:359
  • The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
    delphi/scripts/regression_download.py:367
  • The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.
    delphi/scripts/regression_download.py:20
  • The usage example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename.
    delphi/scripts/regression_download.py:26
  • The usage example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename.
    delphi/scripts/regression_download.py:351
  • The example references download_real_data.py but the script file is actually named regression_download.py. Update to use the correct filename python scripts/regression_download.py.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jucor added a commit to jucor/polis that referenced this pull request Nov 26, 2025
- _compute_vote_stats: Replace per-row/per-column loops with numpy
  vectorized operations using boolean masks and axis-based sums.
  This eliminates O(rows + cols) Python loops.

- bench_update_votes.py: Make standalone by accepting CSV path directly
  instead of depending on datasets package. Add TODO for using datasets
  package once PR compdemocracy#2312 is merged.

Combined with pivot_table optimization, achieves ~10x speedup on bg2050
dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jucor
Copy link
Collaborator Author

jucor commented Nov 26, 2025

Ready for human review and merge :-)

jucor added a commit to jucor/polis that referenced this pull request Nov 29, 2025
- _compute_vote_stats: Replace per-row/per-column loops with numpy
  vectorized operations using boolean masks and axis-based sums.
  This eliminates O(rows + cols) Python loops.

- bench_update_votes.py: Make standalone by accepting CSV path directly
  instead of depending on datasets package. Add TODO for using datasets
  package once PR compdemocracy#2312 is merged.

Combined with pivot_table optimization, achieves ~10x speedup on bg2050
dataset (1M votes): 18-30s -> 2.5s (~400k votes/sec).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@ballPointPenguin
Copy link
Member

Suggested Updates here: jucor#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants