Skip to content

OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing traditional tokenization or specific data-type encodings.

License

Notifications You must be signed in to change notification settings

Decentralised-AI/OmniByteFormer

 
 

Repository files navigation

Multi-Modality

OmniByteFormer

Join our Discord Subscribe on YouTube Connect on LinkedIn Follow on X.com

OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing traditional tokenization or specific data-type encodings. Whether the input is text, images, videos, audio, or other data formats, OmniByteFormer treats all data uniformly as bytes, and generates the output directly in bytes. This makes OmniByteFormer a flexible and universal model for multi-modal tasks.

Key Features

  • Universal Input: Accepts various data types (text, image, audio, video, etc.) by converting them into byte sequences.
  • Transformer-Based Architecture: Uses the power of Transformer models for generative tasks with arbitrary data.
  • Byte-Level Processing: Instead of tokenizing or using modality-specific encodings, it processes byte sequences directly, offering a uniform representation for all data types.
  • Multi-Modal Compatibility: Can be trained to generate text, images, videos, or even sound as output from different types of input data.

Architecture

OmniByteFormer is built on a byte-level Transformer architecture. The core model leverages the following components:

  • Byte Embeddings: Converts each byte (0-255) into a learnable embedding vector.
  • Transformer Encoder-Decoder: Applies self-attention and cross-attention mechanisms on byte sequences, enabling the model to learn representations across different modalities.
  • Positional Encoding: Ensures the model retains sequence information by encoding position in byte sequences.
  • Universal Decoder: Outputs the byte sequences, which can be converted back into the original data types (text, image, video, etc.).

Training the Model

python train_byte_transformer.py --epochs 10 --batch_size 32 --lr 1e-4 --seq_len 128 --save_path ./checkpoints

Example

Converting Outputs Back to Data

After generating byte sequences with OmniByteFormer, you can convert the byte outputs back into their original format (text, image, etc.). You’ll need to decode these bytes appropriately based on your task (e.g., text decoding or saving image files).

Examples

  • Text Generation: Convert text to bytes and generate text from bytes.
  • Image-to-Image Generation: Convert images to bytes, pass through the model, and generate new images.
  • Audio/Video Processing: Work with audio or video data by converting them to byte sequences.

Contributing

Feel free to contribute to this project. Fork the repo, make changes, and submit a pull request.

License

This project is licensed under the MIT License.

About

OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing traditional tokenization or specific data-type encodings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.2%
  • Shell 8.8%