TS-exam #35

numericunderflow06 · 2025-11-22T08:20:56Z

Name of the PR

♻️ Current situation & Problem

Link any open issues or pull requests (PRs) related to this PR. Please ensure that all non-trivial PRs are first tracked and discussed in an existing GitHub issue or discussion.

⚙️ Release Notes

Add a bullet point list summary of the feature and possible migration guides if this is a breaking change so this section can be added to the release notes.
Include code snippets that provide examples of the feature implemented or links to the documentation if it appends or changes the public interface.

📚 Documentation

Please ensure that you properly document any additions in conformance to Spezi Documentation Guide.
You can use this section to describe your solution, but we encourage contributors to document your reasoning and changes using in-line documentation.

✅ Testing

Please ensure that the PR meets the testing requirements set by CodeCov and that new functionality is appropriately tested.
This section describes important information about the tests and why some elements might not be testable.

Code of Conduct & Contributing Guidelines

By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines:

I agree to follow the Code of Conduct and Contributing Guidelines.

- Updated .gitignore to exclude data/ directories and download logs - Downloaded and set up M4 dataset (163 MB, 100k samples) - TSQA already in HuggingFace cache (181 MB, 48k samples) - Added dataset_setup_summary.md with details

…nTSLM into ts-exam-2

Added evaluation results for two experiments: 1. stage_tsexam_eval: Sequential curriculum training results (stage1 TSQA -> stage2 M4 -> stage3 TimeSeriesExam eval) 2. tsqa_on_ts_exam: Stage1 checkpoint evaluated on TimeSeriesExam dataset (39.33% accuracy) Results include: - metrics.json: Accuracy and sample counts - test_predictions.jsonl: All predictions with gold answers - test_predictions_rank_0.jsonl: Per-rank predictions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Added SPDX license headers to: - .gitignore - eval_stage1_on_tsexam.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Added SPDX license headers to all .txt log files: - eval_stage1_on_tsexam_fixed_log.txt - eval_stage1_on_tsexam_log.txt - eval_stage1_simple_prompt_log.txt - sanity_check_log.txt - training_with_fix_log.txt 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

numericunderflow06 and others added 17 commits November 20, 2025 17:22

Add dataset setup for M4 and TSQA

673483a

- Updated .gitignore to exclude data/ directories and download logs - Downloaded and set up M4 dataset (163 MB, 100k samples) - TSQA already in HuggingFace cache (181 MB, 48k samples) - Added dataset_setup_summary.md with details

.

8cf9d46

include ts-exam

65eb78d

add ts-exam in the curriculum

b7115ac

training

3ea0096

add merged data training code

738099f

upload eval with sequential training

0cb8c51

Merge branch 'StanfordBDHG:main' into ts-exam-2

74eca8f

update eval

f857646

Merge branch 'ts-exam-2' of https://github.com/numericunderflow06/Ope…

73aa771

…nTSLM into ts-exam-2

add documentation

58c0d54

add licence

37a5459

Add missing REUSE license headers to .gitignore and eval script

a2c1670

Added SPDX license headers to: - .gitignore - eval_stage1_on_tsexam.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

.

3a0e2ea

update accuracy

f981f7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TS-exam #35

TS-exam #35

Uh oh!

numericunderflow06 commented Nov 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

TS-exam #35

Are you sure you want to change the base?

TS-exam #35

Uh oh!

Conversation

numericunderflow06 commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Name of the PR

♻️ Current situation & Problem

⚙️ Release Notes

📚 Documentation

✅ Testing

Code of Conduct & Contributing Guidelines

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

numericunderflow06 commented Nov 22, 2025 •

edited

Loading