-
Notifications
You must be signed in to change notification settings - Fork 18
Add-xformer #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
RohanKhanBD
wants to merge
12
commits into
Open-Superintelligence-Lab:main
Choose a base branch
from
RohanKhanBD:add-xformer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add-xformer #56
RohanKhanBD
wants to merge
12
commits into
Open-Superintelligence-Lab:main
from
RohanKhanBD:add-xformer
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Enhance T4 GPU optimization and update documentation - Updated README.md to reflect T4 optimization, emphasizing single GPU training capabilities. - Modified adaptive_moe_config.py to disable FP8 support and adjust parameters for T4 optimization. - Refined auto_config.py to ensure proper configuration for single T4 GPU usage. - Removed unnecessary multi-GPU logic from train_auto.py and trainer.py, focusing on single T4 GPU training. - Streamlined matmul operations in ops/matmul to prioritize T4 optimization and removed unused implementations. - Updated GPU_ADAPTIVE_README.md to clarify T4-specific optimizations. * Refactor for single T4 GPU optimization and remove Megatron support - Updated requirements.txt to remove Megatron dependency. - Modified adaptive_moe_config.py to disable Megatron-related parameters and clarify single GPU training settings. - Adjusted auto_config.py to eliminate Megatron options and ensure compatibility with single T4 GPU. - Cleaned up train_auto.py and adaptive_llm.py to remove Megatron logic, focusing on native training for T4. - Deleted megatron_wrapper.py as it is no longer needed for single GPU training. - Updated matmul operations to reflect T4 optimizations and removed unused BF16 support. * Update adaptive MoE configuration and matmul operations for T4 optimization - Changed `use_adaptive_matmul` to `use_fp16_matmul` in `adaptive_moe_config.py` to clarify the use of FP16 matmul operations for T4. - Updated feature support checks in `adaptive_moe_config.py` and `adaptive_llm.py` to reflect the new FP16 configuration. - Simplified matmul operations in `ops/matmul/__init__.py` to focus on T4-optimized implementations, removing the registry pattern and unused BF16 support. - Adjusted `auto_config.py` to ensure consistent use of T4 configuration across the codebase. * Refactor adaptive LLM for T4 optimization and remove speedrun components - Removed adaptive layer imports in `adaptive_llm.py`, replacing them with standard PyTorch components for T4 compatibility. - Updated model documentation to reflect T4-specific optimizations, including FP16 precision. - Optimized token embeddings, transformer blocks, and output layers for Tesla T4 GPU. - Deleted speedrun-related files and configurations to streamline the codebase and focus on T4 optimizations. * delete * Refactor auto configuration and training for T4 optimization - Updated `AutoConfig` class in `auto_config.py` to reflect T4-specific optimizations, removing GPU-related parameters. - Simplified dataset sizing logic in `train_auto.py` to standardize for T4 GPU, ensuring consistent model configuration. - Enhanced documentation to clarify T4 optimization focus and training settings. * Refactor training and configuration for T4 optimization - Renamed `train_auto.py` to `train_t4.py` and updated references throughout the codebase to reflect T4-specific training. - Removed the `auto_config.py` file and replaced it with `t4_config.py` for T4-specific configuration logic. - Updated `README.md` to clarify the training process for single T4 GPU and adjusted setup instructions accordingly. - Enhanced `inference.py` to reference the new T4 configuration and training script. - Streamlined setup script to align with the new T4-focused structure. * Refactor training script and update documentation for T4 optimization - Renamed `train_t4.py` to `train.py` to streamline the training process for single T4 GPU. - Updated `README.md` to reflect the new training command and clarify the T4-specific training setup. - Removed the deprecated `train_t4.py` file and adjusted references in the codebase to point to the new `train.py`. - Enhanced error messages in `inference.py` to guide users on the correct training script usage. * Refactor configuration and model components for T4 optimization - Replaced `AdaptiveMoEModelConfig` with `T4MoEModelConfig` across various modules to align with T4-specific optimizations. - Updated model classes and functions to reflect the new T4 configuration, including changes in the training script and data loading. - Removed the deprecated `adaptive_llm.py` file to streamline the codebase and focus on T4-compatible implementations. - Adjusted imports and class definitions in layers and components to utilize T4-optimized versions. - Enhanced documentation to clarify the focus on T4 optimizations and updated relevant function signatures. * Refactor model components and configurations for T4 optimization - Updated imports and class definitions in `configs/__init__.py` to replace RTX-specific configurations with T4-optimized alternatives. - Modified `MultiHeadAttention`, `T4Linear`, and `T4Embedding` classes in `models/components.py` and `models/layers.py` to reflect T4 architecture optimizations. - Adjusted weight initialization and scaling factors in various model classes to align with T4 specifications. - Enhanced documentation comments to clarify T4-specific optimizations across the codebase. * Refactor configurations and model components for T4 optimization - Removed `AdaptiveMoEModelConfig` and related configurations to streamline the codebase for T4 compatibility. - Updated `__init__.py` to exclude `get_development_config` and adjusted imports accordingly. - Replaced `create_adaptive_linear` with `create_t4_linear` in model components to align with T4 architecture. - Simplified dtype handling in matmul operations to focus on T4-specific optimizations. - Cleaned up system information output by removing FP8 and BF16 support details.
RohanKhanBD
commented
Nov 15, 2025
Contributor
Author
RohanKhanBD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The useless line changes are from ruff vscode extention.
Contributor
Author
|
Wait you add ready made a commit that adds xformers #46 why not merge it? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added xformers from this issue:#45