Skip to content

Conversation

@numericunderflow06
Copy link
Collaborator

@numericunderflow06 numericunderflow06 commented Nov 22, 2025

Name of the PR

♻️ Current situation & Problem

Link any open issues or pull requests (PRs) related to this PR. Please ensure that all non-trivial PRs are first tracked and discussed in an existing GitHub issue or discussion.

⚙️ Release Notes

Add a bullet point list summary of the feature and possible migration guides if this is a breaking change so this section can be added to the release notes.
Include code snippets that provide examples of the feature implemented or links to the documentation if it appends or changes the public interface.

📚 Documentation

Please ensure that you properly document any additions in conformance to Spezi Documentation Guide.
You can use this section to describe your solution, but we encourage contributors to document your reasoning and changes using in-line documentation.

✅ Testing

Please ensure that the PR meets the testing requirements set by CodeCov and that new functionality is appropriately tested.
This section describes important information about the tests and why some elements might not be testable.

Code of Conduct & Contributing Guidelines

By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines:

numericunderflow06 and others added 17 commits November 20, 2025 17:22
- Updated .gitignore to exclude data/ directories and download logs
- Downloaded and set up M4 dataset (163 MB, 100k samples)
- TSQA already in HuggingFace cache (181 MB, 48k samples)
- Added dataset_setup_summary.md with details
Added evaluation results for two experiments:
1. stage_tsexam_eval: Sequential curriculum training results (stage1 TSQA -> stage2 M4 -> stage3 TimeSeriesExam eval)
2. tsqa_on_ts_exam: Stage1 checkpoint evaluated on TimeSeriesExam dataset (39.33% accuracy)

Results include:
- metrics.json: Accuracy and sample counts
- test_predictions.jsonl: All predictions with gold answers
- test_predictions_rank_0.jsonl: Per-rank predictions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Added SPDX license headers to:
- .gitignore
- eval_stage1_on_tsexam.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Added SPDX license headers to all .txt log files:
- eval_stage1_on_tsexam_fixed_log.txt
- eval_stage1_on_tsexam_log.txt
- eval_stage1_simple_prompt_log.txt
- sanity_check_log.txt
- training_with_fix_log.txt

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant