Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] feat: add more CI workflow #38

Merged
merged 73 commits into from
Jan 9, 2025
Merged
Changes from 1 commit
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
f4ee653
[ci] upload several tests
PeterSH6 Dec 6, 2024
008a73c
[ci] add sanity and tensordict utility workflow
PeterSH6 Dec 6, 2024
7aa51c0
[ci] fix workflow
PeterSH6 Dec 6, 2024
3c729fd
try fix import ci
PeterSH6 Dec 6, 2024
7425b36
[dataproto] update repeat and unpad/pad
PeterSH6 Dec 7, 2024
23cffb4
fix rollout test to 2GPU
PeterSH6 Dec 7, 2024
7865830
merge master
PeterSH6 Jan 6, 2025
3357d99
add a fsdp vllm hybridengine script, which can be launched by torchrun
PeterSH6 Jan 6, 2025
8734166
fix import test
PeterSH6 Jan 6, 2025
de3b72e
merge master
PeterSH6 Jan 6, 2025
675bff5
update requirement.txt
PeterSH6 Jan 6, 2025
cc99c80
draft vllm fsdp test
PeterSH6 Jan 6, 2025
9756354
update label
PeterSH6 Jan 6, 2025
dc205f2
fix
PeterSH6 Jan 6, 2025
79ae3b0
upload conda
PeterSH6 Jan 7, 2025
2f2bf3d
test conda
PeterSH6 Jan 7, 2025
9bb2cbe
test ci
PeterSH6 Jan 7, 2025
6efecc4
use docker
PeterSH6 Jan 7, 2025
a4bd1cd
test ci
PeterSH6 Jan 7, 2025
1b713fb
test ci
PeterSH6 Jan 7, 2025
5e173a4
test ci
PeterSH6 Jan 7, 2025
5f541cb
update ci
PeterSH6 Jan 7, 2025
3e63cdd
test ci
PeterSH6 Jan 7, 2025
e762e98
fix model loader
PeterSH6 Jan 7, 2025
35f3b74
fix model loader
PeterSH6 Jan 7, 2025
3c2f36e
test ci
PeterSH6 Jan 7, 2025
9f8e7db
test
PeterSH6 Jan 7, 2025
d47180a
upload e2e digit completion test
PeterSH6 Jan 8, 2025
902bbde
update running script for e2e test
PeterSH6 Jan 8, 2025
bfc8adc
update test config
PeterSH6 Jan 8, 2025
870f938
fix path
PeterSH6 Jan 8, 2025
683fbdc
test
PeterSH6 Jan 8, 2025
4332f57
fix import to register autotokenizer
PeterSH6 Jan 8, 2025
e1a6a5b
fix tokenizer
PeterSH6 Jan 8, 2025
0e079e6
fix create dataset
PeterSH6 Jan 8, 2025
175845a
fix
PeterSH6 Jan 8, 2025
1f915aa
fix reward model validate
PeterSH6 Jan 8, 2025
331bf9c
fix reward module of digit_completion
PeterSH6 Jan 8, 2025
8024c87
fix reward module of digit_completion
PeterSH6 Jan 8, 2025
ac8f4ee
fix reward module of digit_completion
PeterSH6 Jan 8, 2025
c66846f
fix reward module of digit_completion
PeterSH6 Jan 8, 2025
b7178c3
fix reward module of digit_completion
PeterSH6 Jan 8, 2025
42d4d79
can run but seems to have some test issue
PeterSH6 Jan 8, 2025
db6e8be
no problem, add check results
PeterSH6 Jan 8, 2025
a6829e6
add e2e training
PeterSH6 Jan 8, 2025
283aa2e
l20-0 seems has docker permission problem, test later
PeterSH6 Jan 8, 2025
9ebdfca
fix
PeterSH6 Jan 8, 2025
ed6d804
test l20-0 and torchrun
PeterSH6 Jan 8, 2025
c37aa78
test l20-0 and torchrun
PeterSH6 Jan 8, 2025
ba84d4c
fix
PeterSH6 Jan 8, 2025
e69c648
fix
PeterSH6 Jan 8, 2025
dc9563d
fix
PeterSH6 Jan 8, 2025
b5371b8
fix
PeterSH6 Jan 8, 2025
bccdb0e
fix
PeterSH6 Jan 8, 2025
6d9b85f
tolerate difference
PeterSH6 Jan 8, 2025
dfa3cb7
tolerate difference with levenshtein
PeterSH6 Jan 8, 2025
705d10a
lint
PeterSH6 Jan 8, 2025
dc78938
add more test for ray
PeterSH6 Jan 8, 2025
c04fca8
delete
PeterSH6 Jan 8, 2025
3c92fc6
use docker on l20
PeterSH6 Jan 8, 2025
091bec8
use docker on l20
PeterSH6 Jan 8, 2025
4d74389
add upgrade
PeterSH6 Jan 8, 2025
2c84712
update ci
PeterSH6 Jan 8, 2025
c069f6b
delete code
PeterSH6 Jan 8, 2025
414f0f2
ignore test
PeterSH6 Jan 8, 2025
b011882
upgrade ray
PeterSH6 Jan 8, 2025
433a678
fix workerhelper method
PeterSH6 Jan 8, 2025
101ae14
lint
PeterSH6 Jan 8, 2025
526fd31
revert worker changes
PeterSH6 Jan 8, 2025
611a630
fix
PeterSH6 Jan 8, 2025
3e56953
fix
PeterSH6 Jan 8, 2025
7f5e1ac
fix
PeterSH6 Jan 8, 2025
b6ed2d5
fix worker missing func
PeterSH6 Jan 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
test l20-0 and torchrun
PeterSH6 committed Jan 8, 2025
commit ed6d8042092db19f078bb12e0eab9a30444f50a0
2 changes: 1 addition & 1 deletion .github/workflows/vllm.yml
Original file line number Diff line number Diff line change
@@ -18,7 +18,7 @@ on:

jobs:
vllm:
runs-on: [self-hosted, l20-1]
runs-on: [self-hosted, l20-0]
env:
HTTP_PROXY: ${{ secrets.PROXY_HTTP }}
HTTPS_PROXY: ${{ secrets.PROXY_HTTPS }}
1 change: 1 addition & 0 deletions tests/rollout/test_vllm_hf_loader.py
Original file line number Diff line number Diff line change
@@ -121,6 +121,7 @@ def test_vllm_with_hf():
print(f'hf response: {tokenizer.batch_decode(response)}')
print(f'vllm response: {tokenizer.batch_decode(vllm_output)}')
assert torch.allclose(response, vllm_output), f'hf_response:{response} | vllm_response:{vllm_output}'
assert False
print('Check Pass')