Skip to content

Conversation

@vukrosic
Copy link
Contributor

@vukrosic vukrosic commented Oct 9, 2025

No description provided.

vukrosic and others added 7 commits October 9, 2025 18:26
…oint loading functionality. Added command line arguments for GPU selection and resuming from checkpoints, allowing for flexible training on different hardware setups.
…mode functionality. Adjusted model architecture to fit within B200 memory constraints, modified batch size and sequence length, and implemented command line argument for test mode to streamline experimentation with minimal data and steps.
…djusted model parameters for improved performance, including increased hidden dimensions and expert count. Modified test mode settings to allow for longer training steps and added an inference demo for generating sample text during testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants