You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.
11 class TestMultiversoSharedVariable:
12 def _test_sharedvar(self, row, col):
13 W = sharedvar.mv_shared(
14 value=np.zeros(
15 (row, col),
16 dtype=theano.config.floatX
17 ),
18 name='W',
19 borrow=True
20 )
21 delta = np.array(range(1, row * col + 1),
22 dtype=theano.config.floatX).reshape((row, col))
23 train_model = theano.function([], updates=[(W, W + delta)])
24 for i in xrange(10):
25 train_model()
26 train_model()
27 sharedvar.sync_all_mv_shared_vars() #sent to server
28 #mv.barrier()
29 # to get the newest value, we must sync again
30 mv.barrier()
31 sharedvar.sync_all_mv_shared_vars()
32 for j, actual in enumerate(W.get_value().reshape(-1)):
33 print "[%d] %d %d %d"%(i,j, (j + 1) * (i + 1) * 2 * mv.workers_num(), actual)
34
35 def test_sharedvar(self):
36 self._test_sharedvar(10, 10)
37
38
39 if name == 'main':
40 mv.init()
41 test_shared = TestMultiversoSharedVariable()
42 test_shared.test_sharedvar()
43 mv.shutdown()
I run this test, found When start one worker in one node, it is OK
but When start two worker in one node , all workers were blocked。
mpirun -hostfile alg_cluster.txt -npernode 1 python test_multi.py
mpirun -hostfile alg_cluster.txt -npernode 2 python test_multi.py
there are three ips in my cluster.
The text was updated successfully, but these errors were encountered:
teki1981
changed the title
why workers are blocked when I start 2 process in one node.
why workers are blocked when I start 6 process in three node, 2 process per node
Mar 8, 2017
11 class TestMultiversoSharedVariable:
12 def _test_sharedvar(self, row, col):
13 W = sharedvar.mv_shared(
14 value=np.zeros(
15 (row, col),
16 dtype=theano.config.floatX
17 ),
18 name='W',
19 borrow=True
20 )
21 delta = np.array(range(1, row * col + 1),
22 dtype=theano.config.floatX).reshape((row, col))
23 train_model = theano.function([], updates=[(W, W + delta)])
24 for i in xrange(10):
25 train_model()
26 train_model()
27 sharedvar.sync_all_mv_shared_vars() #sent to server
28 #mv.barrier()
29 # to get the newest value, we must sync again
30 mv.barrier()
31 sharedvar.sync_all_mv_shared_vars()
32 for j, actual in enumerate(W.get_value().reshape(-1)):
33 print "[%d] %d %d %d"%(i,j, (j + 1) * (i + 1) * 2 * mv.workers_num(), actual)
34
35 def test_sharedvar(self):
36 self._test_sharedvar(10, 10)
37
38
39 if name == 'main':
40 mv.init()
41 test_shared = TestMultiversoSharedVariable()
42 test_shared.test_sharedvar()
43 mv.shutdown()
I run this test, found When start one worker in one node, it is OK
but When start two worker in one node , all workers were blocked。
mpirun -hostfile alg_cluster.txt -npernode 1 python test_multi.py
mpirun -hostfile alg_cluster.txt -npernode 2 python test_multi.py
there are three ips in my cluster.
The text was updated successfully, but these errors were encountered: