Skip to content

cannot start training with uv #1131

@jalola

Description

@jalola

Describe the bug

I tried to use uv for training docker container and uv but it does not work in any way.

Steps/Code to reproduce bug

Prepare the code + container:

git clone https://github.com/NVIDIA-NeMo/Automodel.git
cd Automodel
docker run --gpus all --network=host -it --rm -v $(pwd):/workspace/Automodel -v $(pwd):/opt/Automodel --shm-size=32g nvcr.io/nvidia/nemo-automodel:25.11.00 /bin/bash

I tried different ways:

export HF_TOKEN=...
uv run Automodel/examples/vlm_finetune/finetune.py --config Automodel/examples/vlm_finetune/gemma3/gemma3_vl_4b_cord_v2_peft.yaml
>>> ImportError: qwen_vl_utils is not installed. Please install it with `pip install qwen-vl-utils`.

uv run torchrun --nproc-per-node=4 Automodel/examples/llm_finetune/finetune.py --config Automodel/examples/llm_finetune/qwen/qwen3_moe_30b_lora.yaml
>>> AttributeError: module 'nemo_automodel.components.models.common' has no attribute 'BackendConfig'
cd AutoModel
uv sync
uv run torchrun --nproc-per-node=4 examples/llm_finetune/finetune.py --config examples/llm_finetune/nemotron/nemotron_nano_v3_squad_peft.yaml
>>> RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA major versions. PyTorch has CUDA Version=12.9 and torchvision has CUDA Version=13.0. Please reinstall the torchvision that matches your PyTorch install.

Additional context

N/A

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions