02 Nov 14:44

182c112

v3.0.2 Latest

Latest

Triformer

pip install -U triformer

TritonCrossEntropyLoss (New! 🎉)
The Triton implementation of Cross Entropy Loss is optimized for both performance and memory efficiency. It combines the forward and backward passes into a single CUDA kernel, which reduces memory overhead. A key feature is its in-place gradient computation, reusing the logits tensor instead of allocating new memory, resulting in about 2x memory savings compared to standard implementations. The code also supports chunked processing, allowing it to handle large batches by processing data in smaller pieces. For numerical stability, it implements the log-sum-exp trick, and it properly handles padding tokens through an ignore_index parameter. This makes it particularly efficient for large vocabulary sizes (30k-50k tokens) commonly found in language models.

from triformer import TritonCrossEntropyLoss

Assets 2

01 Nov 14:53

dame-cell

v3.0.1

4818d73

v3.0.1

Triformer

pip install -U triformer

TritonDropout (New! 🎉)

A fast and memory-efficient dropout implementation using Triton's parallel processing capabilities. Features deterministic dropout patterns with seed control and optimized memory usage through block processing. Benchmarks show comparable or better training convergence compared to PyTorch's native dropout implementation.

Example usage:

from triformer import TritonDropout

# Basic usage
output = TritonDropout.apply(x, p=0.5)

# With deterministic seed
output = TritonDropout.apply(x, p=0.5, seed=42)

Assets 2

28 Oct 14:29

dame-cell

v3.0.0

932dc5f

v3.0.0

TritonLayerNorm:
Implemented Layer Normalization in Triton, designed to improve the stability and performance of transformer models. This implementation leverages Triton’s capabilities for optimized computation, allowing for faster training and inference times.

TritonSoftmax:
Introduced an efficient Softmax implementation in Triton. This addition enables more effective processing of output layers in neural networks, particularly in tasks requiring probabilistic outputs.

Usage

from triformer import TritonLayerNorm

from triformer import TritonSoftmax

Assets 2

26 Oct 09:50

dame-cell

v1.3.4

c252ee7

1.3.4

change the data type to float32

Assets 2

26 Oct 10:31

dame-cell

v1.1.0

4b52af6

1.1.0

This release completes the implementation of our custom linear layer by adding a fully-functional backward pass, enabling end-to-end training.

What's New

Complete Backward Pass Implementation: Added three specialized Triton kernels for efficient backpropagation:
- backward_input_kernel: Computes gradients with respect to input
- backward_weight_kernel: Computes gradients with respect to weights
- fused_relu_bias_backward_kernel: Fused computation of bias gradients and ReLU backward pass
Performance Optimizations:
- Kernel fusion to minimize memory operations
- Autotuned configurations for optimal performance
- Block-based computation patterns for efficient GPU utilization
- Mixed precision (float32 for computation, float16 for storage)

Previous Release

Forward pass implementation with fused linear transformation and ReLU activation

Technical Details

The backward pass maintains the same performance philosophy as the forward pass:

Leverages Triton for GPU acceleration
Uses autotuning to optimize kernel configurations
Implements efficient memory access patterns
Maintains numerical stability through careful handling of data types

Usage

The layer can now be used as a drop-in replacement for nn.Linear in training scenarios:

layer = TritonLinear(in_features=512, out_features=256)
optimizer = torch.optim.Adam(layer.parameters())
Forward pass
output = layer(input_tensor)
loss = criterion(output, target)
Backward pass (now supported!)
loss.backward()
optimizer.step()

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triformer

Triformer

TritonDropout (New! 🎉)

Usage

What's New

Previous Release

Technical Details

Usage

Releases: dame-cell/Triformer

v3.0.2

Triformer

v3.0.1

Triformer

TritonDropout (New! 🎉)

v3.0.0

Usage

1.3.4

1.1.0

What's New

Previous Release

Technical Details

Usage