👆 TouchNet [WIP]

A PyTorch native N-D parallel library for large-scale multimodal LLM (text/audio) training

Latest News 🔥

[2025/07/07] We support finetuning Qwen2-Audio-7B & Kimi-Audio-7B on ASR task! See WeneSpeech results for details.

Overview

👆 touchnet is highly motivated by torchtitan. Both of them are clean, minimal codebases for large-scale LLM training using native PyTorch. The main goal that differentiates 👆 touchnet from torchtitan is that 👆 touchnet focuses on multimodal LLM training where special data pipelines and model structures are needed. Please note that 👆 touchnet is currently in a pre-release state and under extensive development.

Our guiding principles when building 👆 touchnet are:

⚡️ Blazing-fast checkpointable data loader with modular preprocessing and fully random access for large scale multimodal data
- [New Storage Format] optimized for random access on sequentially saved tar files
- Efficient [Sequence Packing] powered by [Flex Attention]
🤗 Native integration with transformers models while get rid of structured trainer classes (e.g., [PyTorch-Lightning] or [HuggingFace Trainer])
- Only reuse model definitions in transformers and leave other parts untouched
- Entire training logic exposed in a single file [touchnet/bin/train.py], everything is under your control
🛠️ Built-in profilers (CPU/GPU/memory) with flight recorder diagnostics.
- [Nsys-like Profiler] to get optimization recommendations
- [Memory Monitor] to debug OOM errors and improve memory usage
🎯 N-D parallelism enabled through PyTorch native API and minimal lines of model code changes.
- [FSDP2], why FSDP1 -> FSDP2?
- [Tensor Parallel], [Context Parallel], [Pipeline Parallel] (PP WIP🚧), [Distributed Checkpoint]
✨ Intuitive API design for rapid adoption & customization in minutes.
- Supported tasks: [text/pretrain], [audio/pretrain], [audio/sft/asr], more tasks coming soon
- Supported models: [Llama], [LlamaForASR] more models coming soon

Quick Glance at 👆 TouchNet

touchnet_glance2.mp4

Loss, Accuracy, Memory, Throughput, TFLOPs, and MFU logged via both stdout and Tensorboard.

touchnet_tb2.mp4

Detailed CPU/GPU profiling that can be visualized in Tensorboard. Enjoy your optimization journey ~

touchynet_mem.mp4

Memory profiling identifies GPU memory allocation patterns to guide tuning strategies.

Dive into the code

Here is an end-to-end workflow for a traning job in 👆 TouchNet:

stage-1: Download dataset. We use load_dataset API in HuggingFace.datasets to download specific data.
stage-2: Convert dataset format to TouchDataset. see [touchnet/bin/make_data.py]
stage-3: (optional) Convert hf-format ckpt to torch distributed ckpt. see [touchnet/bin/convert_hf_to_dcp.py]
stage-4: Start training, either from scratch or from pretrained ckpt that has been converted in stage-3. see [touchnet/bin/train.py]
stage-5: Convert torch distributed ckpt to hf-format, enjoy HuggingFace ecosystem for inference and deployment. see [touchnet/bin/convert_dcp_to_hf.py]

For a more concrete example running those stages one by one, see [examples/audio/sft/asr/aishell/run.sh]

Installation

# NOTE(xcsong): Ensure that the linux system's glibc version is greater than or equal to 2.17 (see `ldd --version`)
#               (for example, Ubuntu 22.04 and later versions).
conda create -n touchnet python=3.10
conda activate touchnet
conda install -c conda-forge sox ffmpeg -y

# (Optional) install CUDA + cuDNN if they are not already available; change `prefix` to your install path.
# bash install_cuda_cudnn.sh

# Install TouchNet with GPU support (CUDA 12.6 - recommended)
pip install -e . --index-url https://download.pytorch.org/whl/cu126

# Or install with CUDA 11.8 support
# pip install -e . --index-url https://download.pytorch.org/whl/cu118

# For development with GPU support
# pip install -e '.[dev]' --index-url https://download.pytorch.org/whl/cu126

Citation

@misc{touchnet,
  title={TouchNet: A PyTorch native N-D parallel library for large-scale multimodal LLM (text/audio) training},
  author={Xingchen Song},
  year={2025},
  url={https://github.com/xingchensong/TouchNet},
}

Acknowledge

This repo is highly motivated by torchtitan and we borrowed a lot of code from it.
This repo also benefits from Megatron-LM, WeNet, flame.

Thanks for their wonderful works.

Name		Name	Last commit message	Last commit date
Latest commit History 248 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
tests		tests
touchnet		touchnet
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
install_cuda_cudnn.sh		install_cuda_cudnn.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👆 TouchNet [WIP]

A PyTorch native N-D parallel library for large-scale multimodal LLM (text/audio) training

Latest News 🔥

Overview

Quick Glance at 👆 TouchNet

Dive into the code

Installation

Citation

Acknowledge

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

xingchensong/TouchNet

Folders and files

Latest commit

History

Repository files navigation

👆 TouchNet [WIP]

A PyTorch native N-D parallel library for large-scale multimodal LLM (text/audio) training

Latest News 🔥

Overview

Quick Glance at 👆 TouchNet

Dive into the code

Installation

Citation

Acknowledge

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages