- New dependency: intspan
- New Prodigy recipe for adjudicating text annotations
- Refactored recipes to use Prodigy API
- Extended recipes to optionally fetch media (i.e., images)
- Added unit testing
- Fixed Codecov integration
- Now requires Python 3.12
- Basic readme documentation for filter script
- New script for OCR with google vision
- Updated filter script:
- Uses PPA work ids instead of source ids
- Additional filtering by volume and page
- Additional filtering by include or exclude key-pair values
- New utilities function for working with PPA corpus file paths
- New script for generating PPA page subset to be used in conjunction with the filter script
- New script for adding image relative paths to a PPA text corpus
- New Prodigy recipes and custom CSS for image and text annotation
- Script to add PPA work-level metadata for display in Prodigy
- Ruff precommit hook now configured to autofix import order
- Utility to filter the full text corpus by source ID
- Experimental Scripts
- OCR evaluation
- Character-level statistics