How to use distributed-training in single client for llm #4925

hejxiang · 2025-02-11T06:42:18Z

What is your question?

I have successfully completed the federated training of LLM following the link below.
https://flower.ai/docs/framework/tutorial-quickstart-huggingface.html

Now I want to train larger model, so how can I perform fine-tuning of LLM using distributed training on a single client?
I would like to use multiple GPUs or multiple servers on one client. Can I use something like 'accelerate launch', 'torch run', or 'deepspeed' in client ?

https://huggingface.co/docs/trl/example_overview#distributed-training

Thanks !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use distributed-training in single client for llm #4925

How to use distributed-training in single client for llm #4925

hejxiang commented Feb 11, 2025

How to use distributed-training in single client for llm #4925

How to use distributed-training in single client for llm #4925

Comments

hejxiang commented Feb 11, 2025

What is your question?