-
-
Notifications
You must be signed in to change notification settings - Fork 93
TS-exam #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
numericunderflow06
wants to merge
17
commits into
StanfordBDHG:main
Choose a base branch
from
numericunderflow06:ts-exam-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
TS-exam #35
numericunderflow06
wants to merge
17
commits into
StanfordBDHG:main
from
numericunderflow06:ts-exam-2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Updated .gitignore to exclude data/ directories and download logs - Downloaded and set up M4 dataset (163 MB, 100k samples) - TSQA already in HuggingFace cache (181 MB, 48k samples) - Added dataset_setup_summary.md with details
Added evaluation results for two experiments: 1. stage_tsexam_eval: Sequential curriculum training results (stage1 TSQA -> stage2 M4 -> stage3 TimeSeriesExam eval) 2. tsqa_on_ts_exam: Stage1 checkpoint evaluated on TimeSeriesExam dataset (39.33% accuracy) Results include: - metrics.json: Accuracy and sample counts - test_predictions.jsonl: All predictions with gold answers - test_predictions_rank_0.jsonl: Per-rank predictions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added SPDX license headers to: - .gitignore - eval_stage1_on_tsexam.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Added SPDX license headers to all .txt log files: - eval_stage1_on_tsexam_fixed_log.txt - eval_stage1_on_tsexam_log.txt - eval_stage1_simple_prompt_log.txt - sanity_check_log.txt - training_with_fix_log.txt 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Name of the PR
♻️ Current situation & Problem
Link any open issues or pull requests (PRs) related to this PR. Please ensure that all non-trivial PRs are first tracked and discussed in an existing GitHub issue or discussion.
⚙️ Release Notes
Add a bullet point list summary of the feature and possible migration guides if this is a breaking change so this section can be added to the release notes.
Include code snippets that provide examples of the feature implemented or links to the documentation if it appends or changes the public interface.
📚 Documentation
Please ensure that you properly document any additions in conformance to Spezi Documentation Guide.
You can use this section to describe your solution, but we encourage contributors to document your reasoning and changes using in-line documentation.
✅ Testing
Please ensure that the PR meets the testing requirements set by CodeCov and that new functionality is appropriately tested.
This section describes important information about the tests and why some elements might not be testable.
Code of Conduct & Contributing Guidelines
By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines: