Skip to content

Conversation

@irexyc
Copy link
Collaborator

@irexyc irexyc commented Mar 28, 2025

Usage

A two node example with tp=2, dp=1 and device_num=2

node 0

export CUDA_VISIBLE_DEVICES=0
lmdeploy serve api_server \
    Qwen/Qwen2.5-7B-Instruct \
    --server-port 29200 \
    --tp 2 \
    --dp 1 \
    --nnodes 2 \
    --node-rank 0 \
    --dist-init-addr 127.0.0.1:7888

node 1

export CUDA_VISIBLE_DEVICES=1
lmdeploy serve api_server \
    Qwen/Qwen2.5-7B-Instruct \
    --server-port 29201 \
    --tp 2 \
    --dp 1 \
    --nnodes 2 \
    --node-rank 1 \
    --dist-init-addr 127.0.0.1:7888

@irexyc irexyc added the WIP label Mar 28, 2025
@irexyc irexyc changed the title [WIP] Add Gloo communication to turobmind Add Gloo communication to turobmind Apr 10, 2025
@irexyc irexyc removed the WIP label Apr 10, 2025
@irexyc
Copy link
Collaborator Author

irexyc commented Apr 10, 2025

oc evaluate diff.csv

@lvhan028 lvhan028 added the enhancement New feature or request label Apr 10, 2025
@lvhan028 lvhan028 requested a review from lzhangzz April 24, 2025 03:45
@lzhangzz lzhangzz changed the title Add Gloo communication to turobmind Add Gloo communication to turbomind Dec 11, 2025
@lvhan028 lvhan028 merged commit def3052 into InternLM:main Dec 25, 2025
9 checks passed
irexyc added a commit that referenced this pull request Jan 6, 2026
* init gloo support

* use pytorch tcpstore

* update gateway and support setting devices

* fix build

* use tm cfg instead of env

* fix dp

* fix lint

* fix build

* fix ci

* update gloo version to match pytroch/v2.8.0-rc4

* simplify devices setup

* change the size of engine_params_ to device_per_node

* use dist_init_addr for init

* remove unused

* update

* optimize serialization

* buffer management

* fix wait

* remove constraint that each node must has attn_dp

* add hybrid comm & optimize broadcast

* add test & benchmark code

* add ibverbs transport

* remove grammar deps in irrelevant cmakelists

* use serdes

* hide hostcomm implementation details

* skip serialize buffer of Request.outputs

* fix try_pop

* use default 30mins timeout

* support loading model with 512 experts

* remove unused

* remove ex archive

* use is_loading static var

* fix dummy node logic

* use large timeout for broadcast request

* add comments to metrics

* use hybrid comm as default for multi nodes

* update inter comm split in hybrid comm

* remove unused

* fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants