Add-xformer #56

RohanKhanBD · 2025-11-15T12:01:57Z

Added xformers from this issue:#45

* Enhance T4 GPU optimization and update documentation - Updated README.md to reflect T4 optimization, emphasizing single GPU training capabilities. - Modified adaptive_moe_config.py to disable FP8 support and adjust parameters for T4 optimization. - Refined auto_config.py to ensure proper configuration for single T4 GPU usage. - Removed unnecessary multi-GPU logic from train_auto.py and trainer.py, focusing on single T4 GPU training. - Streamlined matmul operations in ops/matmul to prioritize T4 optimization and removed unused implementations. - Updated GPU_ADAPTIVE_README.md to clarify T4-specific optimizations. * Refactor for single T4 GPU optimization and remove Megatron support - Updated requirements.txt to remove Megatron dependency. - Modified adaptive_moe_config.py to disable Megatron-related parameters and clarify single GPU training settings. - Adjusted auto_config.py to eliminate Megatron options and ensure compatibility with single T4 GPU. - Cleaned up train_auto.py and adaptive_llm.py to remove Megatron logic, focusing on native training for T4. - Deleted megatron_wrapper.py as it is no longer needed for single GPU training. - Updated matmul operations to reflect T4 optimizations and removed unused BF16 support. * Update adaptive MoE configuration and matmul operations for T4 optimization - Changed `use_adaptive_matmul` to `use_fp16_matmul` in `adaptive_moe_config.py` to clarify the use of FP16 matmul operations for T4. - Updated feature support checks in `adaptive_moe_config.py` and `adaptive_llm.py` to reflect the new FP16 configuration. - Simplified matmul operations in `ops/matmul/__init__.py` to focus on T4-optimized implementations, removing the registry pattern and unused BF16 support. - Adjusted `auto_config.py` to ensure consistent use of T4 configuration across the codebase. * Refactor adaptive LLM for T4 optimization and remove speedrun components - Removed adaptive layer imports in `adaptive_llm.py`, replacing them with standard PyTorch components for T4 compatibility. - Updated model documentation to reflect T4-specific optimizations, including FP16 precision. - Optimized token embeddings, transformer blocks, and output layers for Tesla T4 GPU. - Deleted speedrun-related files and configurations to streamline the codebase and focus on T4 optimizations. * delete * Refactor auto configuration and training for T4 optimization - Updated `AutoConfig` class in `auto_config.py` to reflect T4-specific optimizations, removing GPU-related parameters. - Simplified dataset sizing logic in `train_auto.py` to standardize for T4 GPU, ensuring consistent model configuration. - Enhanced documentation to clarify T4 optimization focus and training settings. * Refactor training and configuration for T4 optimization - Renamed `train_auto.py` to `train_t4.py` and updated references throughout the codebase to reflect T4-specific training. - Removed the `auto_config.py` file and replaced it with `t4_config.py` for T4-specific configuration logic. - Updated `README.md` to clarify the training process for single T4 GPU and adjusted setup instructions accordingly. - Enhanced `inference.py` to reference the new T4 configuration and training script. - Streamlined setup script to align with the new T4-focused structure. * Refactor training script and update documentation for T4 optimization - Renamed `train_t4.py` to `train.py` to streamline the training process for single T4 GPU. - Updated `README.md` to reflect the new training command and clarify the T4-specific training setup. - Removed the deprecated `train_t4.py` file and adjusted references in the codebase to point to the new `train.py`. - Enhanced error messages in `inference.py` to guide users on the correct training script usage. * Refactor configuration and model components for T4 optimization - Replaced `AdaptiveMoEModelConfig` with `T4MoEModelConfig` across various modules to align with T4-specific optimizations. - Updated model classes and functions to reflect the new T4 configuration, including changes in the training script and data loading. - Removed the deprecated `adaptive_llm.py` file to streamline the codebase and focus on T4-compatible implementations. - Adjusted imports and class definitions in layers and components to utilize T4-optimized versions. - Enhanced documentation to clarify the focus on T4 optimizations and updated relevant function signatures. * Refactor model components and configurations for T4 optimization - Updated imports and class definitions in `configs/__init__.py` to replace RTX-specific configurations with T4-optimized alternatives. - Modified `MultiHeadAttention`, `T4Linear`, and `T4Embedding` classes in `models/components.py` and `models/layers.py` to reflect T4 architecture optimizations. - Adjusted weight initialization and scaling factors in various model classes to align with T4 specifications. - Enhanced documentation comments to clarify T4-specific optimizations across the codebase. * Refactor configurations and model components for T4 optimization - Removed `AdaptiveMoEModelConfig` and related configurations to streamline the codebase for T4 compatibility. - Updated `__init__.py` to exclude `get_development_config` and adjusted imports accordingly. - Replaced `create_adaptive_linear` with `create_t4_linear` in model components to align with T4 architecture. - Simplified dtype handling in matmul operations to focus on T4-specific optimizations. - Cleaned up system information output by removing FP8 and BF16 support details.

…n-Superintelligence-Lab#24) This reverts commit d172027.

RohanKhanBD

The useless line changes are from ruff vscode extention.

RohanKhanBD · 2025-11-15T12:15:00Z

Wait you add ready made a commit that adds xformers #46 why not merge it?
I did not notice it I made this full commit using vs codes git gui sorry.

vukrosic and others added 11 commits September 22, 2025 12:58

Revert "Only support 1 x T4 GPU (Open-Superintelligence-Lab#23)" (Ope…

db06d77

…n-Superintelligence-Lab#24) This reverts commit d172027.

Merge branch 'Open-Superintelligence-Lab:main' into main

f145e6a

Merge branch 'Open-Superintelligence-Lab:main' into main

5a02a28

Merge branch 'Open-Superintelligence-Lab:main' into main

10e8a34

Merge branch 'Open-Superintelligence-Lab:main' into main

d2650df

Merge branch 'Open-Superintelligence-Lab:main' into main

563a9d4

Merge branch 'Open-Superintelligence-Lab:main' into main

c5bae90

Merge branch 'Open-Superintelligence-Lab:main' into main

8019804

add xformers from url.

a3149d4

remove unused optional

0599f23

RohanKhanBD commented Nov 15, 2025

View reviewed changes

use xformer swiglu

9e0ea5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add-xformer #56

Add-xformer #56

Uh oh!

RohanKhanBD commented Nov 15, 2025

Uh oh!

RohanKhanBD left a comment

Uh oh!

RohanKhanBD commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add-xformer #56

Are you sure you want to change the base?

Add-xformer #56

Uh oh!

Conversation

RohanKhanBD commented Nov 15, 2025

Uh oh!

RohanKhanBD left a comment

Choose a reason for hiding this comment

Uh oh!

RohanKhanBD commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants