question about model pipeline across nodes #1145

db24 · 2024-08-19T07:23:57Z

db24
Aug 19, 2024

currently, I run llama3.1 405b awq-int on 4 nodes with each of 4X40G GPU
first I use --tp=16, breaking GEMM into smaller block, but the inference speed is very slow (Decode batch. #running-req: 1, #token: 2451, token usage: 0.01, gen throughput (token/s): 4.23, #queue-req: 0)

then I try --tp=4, but loading model with out of gpu memory, what I try to achieve was using tensor parallelism within each node of 4 GPU card by NvLink, activations pass across nodes

next I try --tp=8, I found each node only using 2 GPU cards, so I guess --tp with multiple nodes indicate how many cards to employee

I not sure sglang support this scenario: using tensor parallelism within each node, and model break into smaller part by each nodes like GPipe or PipeDream way, if this scenario is supported how to config ?

putting scenatio into graph are:

graph LR
input --> gpu0(gpu0 model part 1 using 4 cards break down GEMM) --activation--> gpu1(model part2) --activation--> gpu2(model part3) --activation--> gpu3(model part4) --> result

db24 · 2024-08-19T07:27:54Z

db24
Aug 19, 2024
Author

BTW the command is:

sudo docker run \
    --rm \
    --name tt \
    --gpus all \
    --net=host \
    --shm-size=1g \
    --ulimit memlock=-1 \
    -p 20000:20000 \
    -p 30000:30000 \
    -v $DATA_DIR/db24/workspace/llm_weights:/llm_weights \
    --env "GLOO_SOCKET_IFNAME=bond0" \
    --env "NCCL_SOCKET_IFNAME=bond0" \
    --env "NCCL_DEBUG=TRACE" \
    lmsysorg/sglang:latest \
	python3 -m sglang.launch_server --model-path /llm_weights/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 --host 0.0.0.0 --port 30000 --tp 4 --nccl-init-addr node0:20000 --nnodes 4 --node-rank $NR --disable-cuda-graph --mem-frac 0.9

0 replies

merrymercy · 2024-09-10T08:14:40Z

merrymercy
Sep 10, 2024
Maintainer

we are working on implementing the pipeline parallelism. It will be available soon.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about model pipeline across nodes #1145

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

question about model pipeline across nodes #1145

db24 Aug 19, 2024

Replies: 2 comments

db24 Aug 19, 2024 Author

merrymercy Sep 10, 2024 Maintainer

db24
Aug 19, 2024

db24
Aug 19, 2024
Author

merrymercy
Sep 10, 2024
Maintainer