You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setting the seed and repeating the fused neighborhood sampling for a source code does not reproduce the same subgraph, have identified a fix that will be slower but allow reproducible subgraphs
To Reproduce
Steps to reproduce the behavior:
Set OMP_NUM_THREADS to >= 2
Define NeighborSampler with fused=True
Set dgl.seed, dgl.random.seed before calling dgl.dataloading.NeighborSampler.sample_blocks
Repeat step 3. and compare blocks[0].srcdata['feat']
The following change allowed for the above to have the same output
Yes, this is a known issue, our implementation of concurrent_id_hash_map can't guarantee deterministic while maintain the high performance. Feel free to file a PR and add a if-else branch to use deterministic solution while user specify the random seed or add other flag to control the behavoir.
Yes, this is a known issue, our implementation of concurrent_id_hash_map can't guarantee deterministic while maintain the high performance. Feel free to file a PR and add a if-else branch to use deterministic solution while user specify the random seed or add other flag to control the behavoir.
Setting the seed and repeating the fused neighborhood sampling for a source code does not reproduce the same subgraph, have identified a fix that will be slower but allow reproducible subgraphs
To Reproduce
Steps to reproduce the behavior:
The following change allowed for the above to have the same output
This results is the sampling being slower but reproducible, would there be any alternative for multithreaded fused sampling being reproducible?
Environment
conda
,pip
, source): sourceThe text was updated successfully, but these errors were encountered: