-
Notifications
You must be signed in to change notification settings - Fork 61
Closed
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue
Milestone
Description
🐛 Describe the bug
please get wheels from https://github.com/intel/torch-xpu-ops/actions/runs/18029979174 or use gh download
gh run download 18029979174 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.9 https://github.com/daisyden/pytorch.git
cd pytorch
pip install -r requirements.txt
cd test/distributed/elastic/rendezvous/
python -m unittest dynamic_rendezvous_test.DistributedRendezvousOpExecutorTest.test_run_adds_to_participants
python -m unittest dynamic_rendezvous_test.DistributedRendezvousOpExecutorTest.test_run_adds_to_participants_and_completes_rendezvous_if_max_nodes_is_reached
python -m unittest dynamic_rendezvous_test.DistributedRendezvousOpExecutorTest.test_run_adds_to_participants_and_starts_last_call_if_min_nodes_is_reached
python -m unittest dynamic_rendezvous_test.DistributedRendezvousOpExecutorTest.test_run_adds_to_participants_if_node_was_in_waitlist
python -m unittest dynamic_rendezvous_test.DistributedRendezvousOpExecutorTest.test_run_adds_to_waitlist
python -m unittest dynamic_rendezvous_test.DistributedRendezvousOpExecutorTest.test_run_keeps_alive
======================================================================
FAIL: test_run_adds_to_participants (dynamic_rendezvous_test.DistributedRendezvousOpExecutorTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/sdp/xiangdong/pytorch/test/distributed/elastic/rendezvous/dynamic_rendezvous_test.py", line 645, in test_run_adds_to_participants
self._assert_action(_Action.ADD_TO_PARTICIPANTS, expected_state)
File "/home/sdp/xiangdong/pytorch/test/distributed/elastic/rendezvous/dynamic_rendezvous_test.py", line 605, in _assert_action
self.assert_state_equal(self._state, expected_state)
File "/home/sdp/xiangdong/pytorch/test/distributed/elastic/rendezvous/dynamic_rendezvous_test.py", line 65, in assert_state_equal
self.assertDictEqual(vars(actual), vars(expected))
AssertionError: {'rou[172 chars]e_1_1: <MagicMock name='datetime.now()' id='134812498392880'>}} != {'rou[172 chars]e_1_1: datetime.datetime(2000, 1, 1, 0, 0)}}
{'closed': False,
'complete': False,
'deadline': None,
- 'last_heartbeats': {this_node_1_1: <MagicMock name='datetime.now()' id='134812498392880'>},
+ 'last_heartbeats': {this_node_1_1: datetime.datetime(2000, 1, 1, 0, 0)},
'participants': {this_node_1_1: 0},
'redundancy_list': set(),
'round': 0,
'wait_list': set()}
----------------------------------------------------------------------
Ran 1 test in 0.005s
FAILED (failures=1)
Versions
pytorch: https://github.com/daisyden/pytorch/tree/distributed_2.9
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue