Skip to content

wip: test transport layer improvements#395

Closed
csegarragonz wants to merge 3 commits intomainfrom
transport-improvements
Closed

wip: test transport layer improvements#395
csegarragonz wants to merge 3 commits intomainfrom
transport-improvements

Conversation

@csegarragonz
Copy link
Copy Markdown
Collaborator

This PR combines:

we will merge them separately, but ideally we want to have both PRs above green, and then measure the benefits (hopefully we are on par with #382)

@csegarragonz csegarragonz changed the title Transport improvements wip: test transport layer improvements Mar 13, 2024
@lgarithm
Copy link
Copy Markdown
Contributor

lgarithm commented Mar 13, 2024

5b7667e

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0130s, total workload: 384000B, rate: 0.028GiB/s
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0123s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0122s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0116s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.337GiB/s
bench_allreduce(np=4) took 0.2543s, total workload: 1.144GiB, rate: 4.497GiB/s
bench_allreduce(np=4) took 0.2547s, total workload: 1.144GiB, rate: 4.491GiB/s
bench_allreduce(np=4) took 0.2571s, total workload: 1.144GiB, rate: 4.448GiB/s
bench_allreduce(np=4) took 0.2604s, total workload: 1.144GiB, rate: 4.391GiB/s
bench_allreduce(np=4) took 0.2504s, total workload: 1.144GiB, rate: 4.567GiB/s
bench_allreduce(np=4) took 0.2560s, total workload: 1.144GiB, rate: 4.468GiB/s
bench_allreduce(np=4) took 0.2545s, total workload: 1.144GiB, rate: 4.494GiB/s
bench_allreduce(np=4) took 0.2544s, total workload: 1.144GiB, rate: 4.495GiB/s
bench_allreduce(np=4) took 0.2533s, total workload: 1.144GiB, rate: 4.515GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3831s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3537s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3587s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3512s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3410s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3503s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3591s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3488s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3470s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3470s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.0902s, total workload: 1.144GiB, rate: 1.049GiB/s
bench_allreduce(np=4) took 1.0794s, total workload: 1.144GiB, rate: 1.060GiB/s
bench_allreduce(np=4) took 0.9957s, total workload: 1.144GiB, rate: 1.149GiB/s
bench_allreduce(np=4) took 0.8494s, total workload: 1.144GiB, rate: 1.346GiB/s
bench_allreduce(np=4) took 0.7802s, total workload: 1.144GiB, rate: 1.466GiB/s
bench_allreduce(np=4) took 0.7733s, total workload: 1.144GiB, rate: 1.479GiB/s
bench_allreduce(np=4) took 0.7723s, total workload: 1.144GiB, rate: 1.481GiB/s
bench_allreduce(np=4) took 0.7585s, total workload: 1.144GiB, rate: 1.508GiB/s
bench_allreduce(np=4) took 0.7731s, total workload: 1.144GiB, rate: 1.479GiB/s
bench_allreduce(np=4) took 0.7653s, total workload: 1.144GiB, rate: 1.494GiB/s
END ======================================== bench_allreduce remote ========================================

ba0d691

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0124s, total workload: 384000B, rate: 0.029GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0120s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0119s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0119s, total workload: 384000B, rate: 0.030GiB/s
bench_allreduce(np=4) took 0.0115s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0114s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.0115s, total workload: 384000B, rate: 0.031GiB/s
bench_allreduce(np=4) took 0.2637s, total workload: 1.144GiB, rate: 4.336GiB/s
bench_allreduce(np=4) took 0.2531s, total workload: 1.144GiB, rate: 4.519GiB/s
bench_allreduce(np=4) took 0.2512s, total workload: 1.144GiB, rate: 4.553GiB/s
bench_allreduce(np=4) took 0.2541s, total workload: 1.144GiB, rate: 4.500GiB/s
bench_allreduce(np=4) took 0.2541s, total workload: 1.144GiB, rate: 4.501GiB/s
bench_allreduce(np=4) took 0.2536s, total workload: 1.144GiB, rate: 4.510GiB/s
bench_allreduce(np=4) took 0.2544s, total workload: 1.144GiB, rate: 4.495GiB/s
bench_allreduce(np=4) took 0.2535s, total workload: 1.144GiB, rate: 4.512GiB/s
bench_allreduce(np=4) took 0.2543s, total workload: 1.144GiB, rate: 4.497GiB/s
bench_allreduce(np=4) took 0.2528s, total workload: 1.144GiB, rate: 4.524GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3830s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3861s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3815s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3696s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3268s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3261s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3305s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3322s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3092s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3340s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.1843s, total workload: 1.144GiB, rate: 0.966GiB/s
bench_allreduce(np=4) took 1.0431s, total workload: 1.144GiB, rate: 1.096GiB/s
bench_allreduce(np=4) took 1.0408s, total workload: 1.144GiB, rate: 1.099GiB/s
bench_allreduce(np=4) took 0.9793s, total workload: 1.144GiB, rate: 1.168GiB/s
bench_allreduce(np=4) took 0.9878s, total workload: 1.144GiB, rate: 1.158GiB/s
bench_allreduce(np=4) took 1.0487s, total workload: 1.144GiB, rate: 1.091GiB/s
bench_allreduce(np=4) took 0.9917s, total workload: 1.144GiB, rate: 1.153GiB/s
bench_allreduce(np=4) took 1.0071s, total workload: 1.144GiB, rate: 1.136GiB/s
bench_allreduce(np=4) took 0.9798s, total workload: 1.144GiB, rate: 1.167GiB/s
bench_allreduce(np=4) took 1.0379s, total workload: 1.144GiB, rate: 1.102GiB/s
END ======================================== bench_allreduce remote ========================================

71d3f79

BGN ======================================== bench_allreduce local ========================================
bench_allreduce(np=4) took 0.0038s, total workload: 384000B, rate: 0.093GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.098GiB/s
bench_allreduce(np=4) took 0.0037s, total workload: 384000B, rate: 0.097GiB/s
bench_allreduce(np=4) took 0.0035s, total workload: 384000B, rate: 0.103GiB/s
bench_allreduce(np=4) took 0.0033s, total workload: 384000B, rate: 0.108GiB/s
bench_allreduce(np=4) took 0.0032s, total workload: 384000B, rate: 0.110GiB/s
bench_allreduce(np=4) took 0.2252s, total workload: 1.144GiB, rate: 5.079GiB/s
bench_allreduce(np=4) took 0.2040s, total workload: 1.144GiB, rate: 5.606GiB/s
bench_allreduce(np=4) took 0.2034s, total workload: 1.144GiB, rate: 5.622GiB/s
bench_allreduce(np=4) took 0.2023s, total workload: 1.144GiB, rate: 5.652GiB/s
bench_allreduce(np=4) took 0.2098s, total workload: 1.144GiB, rate: 5.450GiB/s
bench_allreduce(np=4) took 0.2046s, total workload: 1.144GiB, rate: 5.590GiB/s
bench_allreduce(np=4) took 0.2052s, total workload: 1.144GiB, rate: 5.573GiB/s
bench_allreduce(np=4) took 0.2044s, total workload: 1.144GiB, rate: 5.596GiB/s
bench_allreduce(np=4) took 0.2060s, total workload: 1.144GiB, rate: 5.551GiB/s
bench_allreduce(np=4) took 0.2059s, total workload: 1.144GiB, rate: 5.553GiB/s
END ======================================== bench_allreduce local ========================================
BGN ======================================== bench_allreduce remote ========================================
bench_allreduce(np=4) took 0.3308s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3140s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3002s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3112s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3261s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3326s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3107s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.3072s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.2969s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 0.2862s, total workload: 384000B, rate: 0.001GiB/s
bench_allreduce(np=4) took 1.0909s, total workload: 1.144GiB, rate: 1.048GiB/s
bench_allreduce(np=4) took 0.9866s, total workload: 1.144GiB, rate: 1.159GiB/s
bench_allreduce(np=4) took 0.9789s, total workload: 1.144GiB, rate: 1.168GiB/s
bench_allreduce(np=4) took 0.9276s, total workload: 1.144GiB, rate: 1.233GiB/s
bench_allreduce(np=4) took 0.9301s, total workload: 1.144GiB, rate: 1.230GiB/s
bench_allreduce(np=4) took 0.9553s, total workload: 1.144GiB, rate: 1.197GiB/s
bench_allreduce(np=4) took 0.9490s, total workload: 1.144GiB, rate: 1.205GiB/s
bench_allreduce(np=4) took 0.9726s, total workload: 1.144GiB, rate: 1.176GiB/s
bench_allreduce(np=4) took 0.8959s, total workload: 1.144GiB, rate: 1.277GiB/s
bench_allreduce(np=4) took 0.9749s, total workload: 1.144GiB, rate: 1.173GiB/s
END ======================================== bench_allreduce remote ========================================

lgarithm#2 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants