Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nasty segfault when running experiments just from run.py #8

Open
sgpthomas opened this issue Jul 17, 2017 · 8 comments
Open

Nasty segfault when running experiments just from run.py #8

sgpthomas opened this issue Jul 17, 2017 · 8 comments

Comments

@sgpthomas
Copy link

https://github.com/DSEF/janus. Here is my updated code for janus. I've just ported things to python3 which involved pretty minimal changes. I needed to change the cpp python module stuff, update the build system, and change ps.py. Running ./dsef/run.py will run two experiments. If you take a look at that code, you will see its basically the original run.py but reorganized a bit and I added some RPyC stuff to make it easier to call from the DSEF server and send data back and forth. The current version has all the RPyC server code commented out to try and isolate this segmentation fault.

I believe that the segmentation fault comes from something in _pyrpc not reseting state properly between runs. Before this wasn't a problem because the python instance was completely exiting between each experiment.

I would greatly appreciate any insight into what might be causing this.

P.S. This is built from an old version of Janus, commit f45fd04

@shuaimu
Copy link
Contributor

shuaimu commented Jul 17, 2017

So does the original run.py work (without the changes)?

@sgpthomas
Copy link
Author

yes, the problem only happens if you try running two experiments from one run.py script. When you run the modified version, notice that the first experiment runs fine, it only crashes halfway through the second experiment

@sgpthomas
Copy link
Author

Here's a full log:

logging to file: /users/sgt43/janus/log/dsef-tpca_occ_multi_paxos_1_-1.log
DEBUG: logger initialized
Launch config: {'host': {'node-10': 'node-10', 'node-14': 'node-14', 'node-12': 'node-12', 'node-15': 'node-15', 'node-1': 'node-1', 'node-13': 'node-13', 'node-2': 'node-2', 'node-19': 'node-19', 'node-5': 'node-5', 'node-3': 'node-3', 'node-16': 'node-16', 'node-18': 'node-18', 'node-20': 'node-20', 'node-9': 'node-9', 'node-7': 'node-7', 'node-6': 'node-6', 'node-17': 'node-17', 'node-11': 'node-11', 'node-4': 'node-4', 'node-8': 'node-8'}, 'bench': {'dist': 'uniform', 'weight': {'write': 1, 'read': 0}, 'workload': 'tpca', 'coefficient': 1, 'scale': 1, 'population': {'teller': 100000, 'branch': 100000, 'customer': 100000}}, 'mode': {'batch': False, 'ongoing': 1, 'retry': 20, 'cc': 'occ', 'read_only': 'occ', 'ab': 'multi_paxos'}, 'process': {'c0': 'node-20', 's4': 'node-10', 's3': 'node-1', 's2': 'node-11', 's0': 'node-1', 's5': 'node-11', 's1': 'node-10'}, 'n_concurrent': 1, 'client': {'type': 'closed', 'rate': -1, 'forwarding': False}, 'sharding': {'teller': 'MOD', 'branch': 'MOD', 'customer': 'MOD'}, 'site': {'server': [['s0:10000', 's1:10001', 's2:10002'], ['s3:10003', 's4:10004', 's5:10005']], 'client': [['c0']]}, 'schema': [{'name': 'branch', 'column': [{'type': 'integer', 'primary': True, 'name': 'branch_id'}, {'type': 'integer', 'name': 'balance'}]}, {'name': 'teller', 'column': [{'type': 'integer', 'primary': True, 'name': 'teller_id'}, {'type': 'integer', 'foreign': 'branch.branch_id', 'name': 'branch_id'}]}, {'name': 'customer', 'column': [{'type': 'integer', 'primary': True, 'name': 'customer_id'}, {'type': 'integer', 'foreign': 'branch.branch_id', 'name': 'branch_id'}, {'type': 'integer', 'name': 'balance'}]}], 'args': <__main__.TrialConfig object at 0x7f8b9cdd6ba8>}
INFO: process node-20 0 node-20:5555
INFO: process node-1 1 node-1:5556
INFO: process node-10 2 node-10:5557
INFO: process node-11 3 node-11:5558
INFO: add_site: s0, server, 10000
INFO: add_site: s1, server, 10001
INFO: add_site: s2, server, 10002
INFO: add_site: s3, server, 10003
INFO: add_site: s4, server, 10004
INFO: add_site: s5, server, 10005
INFO: add_site: c0, client, 5555
INFO: process infos: {'node-10': <__main__.ProcessInfo object at 0x7f8b9cdd6278>, 'node-20': <__main__.ProcessInfo object at 0x7f8b9cdd63c8>, 'node-1': <__main__.ProcessInfo object at 0x7f8b9cdd6198>, 'node-11': <__main__.ProcessInfo object at 0x7f8b9cdd62e8>}
DEBUG: timeout waiting for value from: 0
DEBUG: Existing Server or Client Processes:
----------
Server: node-10
-------------
sgt43     1644  1643  1644  0    1  3128  2920   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1646  1644  1646  0    1  3556   960   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-1
-------------
sgt43     1678  1677  1678  0    1  3128  2864   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1680  1678  1680  0    1  3556   968   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-20
-------------
sgt43     1745  1744  1745  0    1  3128  2868   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1747  1745  1747  0    1  3556   968   1 11:39 ?        00:00:00 grep deptran_server

INFO: waiting for killall commands to finish.
ERROR: host: 1; killall did not kill anything
ERROR: host: 1; killall did not kill anything
ERROR: host: 1; killall did not kill anything
ERROR: host: 1; killall did not kill anything
INFO: done waiting for killall commands to finish.
DEBUG: Existing Server or Client After Kill:
----------
Server: node-20
-------------
sgt43     1787  1786  1787  0    1  3128  2920   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1789  1787  1789  0    1  3556   964   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-10
-------------
sgt43     1686  1685  1686  0    1  3128  2924   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1688  1686  1688  0    1  3556   936   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-1
-------------
sgt43     1719  1718  1719  0    1  3128  2952   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1721  1719  1721  0    1  3556   936   0 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-11
-------------
sgt43     1794  1793  1794  0    1  3128  2936   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1796  1794  1796  0    1  3556   976   1 11:39 ?        00:00:00 grep deptran_server

INFO: No taskset, auto scheduling
DEBUG: before server_controller.start
DEBUG: {'node-10': <__main__.ProcessInfo object at 0x7f8b9cdd6278>, 'node-20': <__main__.ProcessInfo object at 0x7f8b9cdd63c8>, 'node-1': <__main__.ProcessInfo object at 0x7f8b9cdd6198>, 'node-11': <__main__.ProcessInfo object at 0x7f8b9cdd62e8>}
INFO: starting node-10 @ node-10
INFO: starting node-20 @ node-20
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-10' -p 5557 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-10.log' 2>'/users/sgt43/janus/log/proc-node-10.err' &
INFO: starting node-1 @ node-1
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-20' -p 5555 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-20.log' 2>'/users/sgt43/janus/log/proc-node-20.err' &
INFO: starting node-11 @ node-11
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-1' -p 5556 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-1.log' 2>'/users/sgt43/janus/log/proc-node-1.err' &
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-11' -p 5558 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-11.log' 2>'/users/sgt43/janus/log/proc-node-11.err' &
DEBUG: after server_controller.start
DEBUG: in setup_heartbeat
INFO: Waiting for server init ...
DEBUG: in server_heart_beat
DEBUG: in connect_rpc 1
INFO: start connect to server ctrl rpc for site s1 @ node-10:20001
E [client.cpp:139] 2017-07-17 11:39:36.206 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:36.306 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:36.407 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:36.507 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:36.608 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:36.708 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:36.809 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:36.909 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.009 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.110 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.210 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.312 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.413 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.513 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.614 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.715 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.816 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:37.917 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.018 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.119 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.220 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.320 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.421 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.521 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.621 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.722 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.823 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:38.925 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:39.025 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:39.127 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:39.227 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:39.328 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:39:39.429 | rrr::Client: connect(node-10:20001): Connection refused
D [client.cpp:144] 2017-07-17 11:39:39.530 | rrr::Client: connected to node-10:20001
INFO: Connected to site s1 @ node-10
DEBUG: in connect_rpc 4
INFO: start connect to server ctrl rpc for site s4 @ node-10:20004
D [client.cpp:144] 2017-07-17 11:39:39.633 | rrr::Client: connected to node-10:20004
INFO: Connected to site s4 @ node-10
DEBUG: in connect_rpc 0
INFO: start connect to server ctrl rpc for site s0 @ node-1:20000
D [client.cpp:144] 2017-07-17 11:39:39.735 | rrr::Client: connected to node-1:20000
INFO: Connected to site s0 @ node-1
DEBUG: in connect_rpc 3
INFO: start connect to server ctrl rpc for site s3 @ node-1:20003
D [client.cpp:144] 2017-07-17 11:39:39.837 | rrr::Client: connected to node-1:20003
INFO: Connected to site s3 @ node-1
DEBUG: in connect_rpc 2
INFO: start connect to server ctrl rpc for site s2 @ node-11:20002
D [client.cpp:144] 2017-07-17 11:39:39.939 | rrr::Client: connected to node-11:20002
INFO: Connected to site s2 @ node-11
DEBUG: in connect_rpc 5
INFO: start connect to server ctrl rpc for site s5 @ node-11:20005
D [client.cpp:144] 2017-07-17 11:39:40.042 | rrr::Client: connected to node-11:20005
INFO: Connected to site s5 @ node-11
INFO: call sync_server_ready on site 1
INFO: site s1 ready
INFO: call sync_server_ready on site 4
INFO: site s4 ready
INFO: call sync_server_ready on site 0
INFO: site s0 ready
INFO: call sync_server_ready on site 3
INFO: site s3 ready
INFO: call sync_server_ready on site 2
INFO: site s2 ready
INFO: call sync_server_ready on site 5
INFO: site s5 ready
DEBUG: top server heartbeat loop
INFO: Waiting for server init ... Done
DEBUG: ping s1
DEBUG: in connect_rpc 6
DEBUG: ping s4
DEBUG: ping s0
INFO: start connect to client ctrl rpc for site c0 @ node-20:5555
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
D [client.cpp:144] 2017-07-17 11:39:40.154 | rrr::Client: connected to node-20:5555
INFO: Connected to client site c0 @ node-20
INFO: Clients all ready
DEBUG: txn: b'PAYMENT' - 10
INFO: client start send successfully.
INFO: Clients started
DEBUG: in benchmark_record
INFO: contact 1 client rpc proxies
DEBUG: top client heartbeat; timeout 5
DEBUG: before recording period: 10
DEBUG: mid_pre_commit_txn (+1): 1
DEBUG: timing from server: run_sec 0.00; run_nsec 2034036.00
DEBUG: avg timing from 1 servers: run_sec 0.00; run_nsec 2034036.00
INFO: Progress: 0
INFO: total_time: 0.002034036
INFO: 
Progress: 0%
TOTAL: elapsed time: 0.0
RATIO    NAME       start    finish    attempts    commits    TPS
-------  -------  -------  --------  ----------  ---------  -----
100.0%   PAYMENT        2         1           1          1    492
----     Total          2         1           1          1    492

INTERVAL: elapsed time: 0.0
RATIO    NAME       start    finish    attempts    commits    TPS    min lt    max lt    50.0% LATENCY    90.0% LATENCY    99.0% LATENCY    99.9% LATENCY    50.0% ATT_LT    90.0% ATT_LT
-------  -------  -------  --------  ----------  ---------  -----  --------  --------  ---------------  ---------------  ---------------  ---------------  --------------  --------------
100.0%   PAYMENT        2         1           1          1    492   99999.9   99999.9          99999.9          99999.9          99999.9          99999.9         99999.9         99999.9
----     Total          2         1           1          1    492       0         0                0                0                0                0               0               0
	Total asking finish: 140497429904115
----------------------------------------------------------------------

DEBUG: top server heartbeat loop
DEBUG: ping s1
DEBUG: ping s4
DEBUG: ping s0
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
DEBUG: top server heartbeat loop
DEBUG: ping s1
DEBUG: ping s4
DEBUG: ping s0
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
DEBUG: top client heartbeat; timeout 5
DEBUG: before recording period: 10
DEBUG: mid_pre_commit_txn (+3885): 3885
DEBUG: timing from server: run_sec 5.00; run_nsec 9735026.00
DEBUG: avg timing from 1 servers: run_sec 5.00; run_nsec 9735026.00
INFO: Progress: 50
INFO: start recording period
INFO: total_time: 5.009735026
INFO: 
Progress: 50%
TOTAL: elapsed time: 5.01
RATIO    NAME       start    finish    attempts    commits    TPS
-------  -------  -------  --------  ----------  ---------  -----
100.0%   PAYMENT     3886      3885        3886       3885    775
----     Total       3886      3885        3886       3885    775

INTERVAL: elapsed time: 5.01
RATIO    NAME       start    finish    attempts    commits    TPS    min lt    max lt    50.0% LATENCY    90.0% LATENCY    99.0% LATENCY    99.9% LATENCY    50.0% ATT_LT    90.0% ATT_LT
-------  -------  -------  --------  ----------  ---------  -----  --------  --------  ---------------  ---------------  ---------------  ---------------  --------------  --------------
100.0%   PAYMENT     3884      3884        3885       3884    776   99999.9   99999.9          99999.9          99999.9          99999.9          99999.9         99999.9         99999.9
----     Total       3884      3884        3885       3884    776       0         0                0                0                0                0               0               0
	Total asking finish: 140497429906163
----------------------------------------------------------------------

DEBUG: top server heartbeat loop
DEBUG: ping s1
DEBUG: ping s4
DEBUG: ping s0
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
INFO: CPU 0: 0.30354957160342716
INFO: CPU 1: -1.0
INFO: CPU 2: 1.4694376528117359
INFO: CPU 3: -1.0
INFO: CPU 4: 0.2998776009791922
INFO: CPU 5: -1.0
DEBUG: top server heartbeat loop
DEBUG: ping s1
DEBUG: ping s4
DEBUG: ping s0
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
DEBUG: top client heartbeat; timeout 5
DEBUG: during recording period!!! 10
DEBUG: timing from server: run_sec 10.00; run_nsec 60842377.00
DEBUG: avg timing from 1 servers: run_sec 10.00; run_nsec 60842377.00
INFO: Progress: 101
INFO: done with recording period
INFO: mid_commit_txn: 7865
INFO: mid_pre_commit_txn: 3885
INFO: mid_time = 5.051107351
INFO: percent: 0.5
INFO: percent: 0.9
INFO: percent: 0.99
INFO: percent: 0.999
INFO: 
__Data__
ab: multi_paxos
all_latency: {'50.0': 1.22021484375, '90.0': 1.385986328125, '99.0': 1.527099609375,
  '99.9': 1.765625, max: 3.478271484375, min: 0.711181640625}
att_latency: {'50.0': 1.220703125, '90.0': 1.38671875, '99.0': 1.52734375, '99.9': 1.76611328125,
  max: 3.478759765625, min: 0.71142578125}
attempts: 3980
benchmark: tpca
cc: occ
clients: 1
commits: 3980
duration: 5.051107351
experiment_id: 1
latency: {'50.0': 1.22021484375, '90.0': 1.385986328125, '99.0': 1.527099609375, '99.9': 1.765625,
  max: 3.478271484375, min: 0.711181640625}
retries_exhausted_cnt: 0
start_cnt: 3979
total_cnt: 3980
tps: 787.9460331034252
txn_name: PAYMENT
zipf: 1

__EndData__

INFO: total_time: 10.060842377
INFO: 
Progress: 101%
TOTAL: elapsed time: 10.06
RATIO    NAME       start    finish    attempts    commits    TPS
-------  -------  -------  --------  ----------  ---------  -----
100.0%   PAYMENT     7865      7865        7866       7865    782
----     Total       7865      7865        7866       7865    782

INTERVAL: elapsed time: 5.05
RATIO    NAME       start    finish    attempts    commits    TPS    min lt    max lt    50.0% LATENCY    90.0% LATENCY    99.0% LATENCY    99.9% LATENCY    50.0% ATT_LT    90.0% ATT_LT
-------  -------  -------  --------  ----------  ---------  -----  --------  --------  ---------------  ---------------  ---------------  ---------------  --------------  --------------
100.0%   PAYMENT     3979      3980        3980       3980    788   99999.9   99999.9          99999.9          99999.9          99999.9          99999.9         99999.9         99999.9
----     Total       3979      3980        3980       3980    788       0         0                0                0                0                0               0               0
	Total asking finish: 140497429908211
----------------------------------------------------------------------

DEBUG: top server heartbeat loop
DEBUG: ping s1
DEBUG: ping s4
DEBUG: ping s0
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
INFO: CPU 0: 0.2055888223552894
INFO: CPU 1: -1.0
INFO: CPU 2: 0.8685258964143426
INFO: CPU 3: -1.0
INFO: CPU 4: 0.20159680638722555
INFO: CPU 5: -1.0
DEBUG: top server heartbeat loop
DEBUG: ping s1
DEBUG: ping s4
DEBUG: ping s0
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
INFO: Duration: 15.16 seconds
INFO: Benchmark finished
DEBUG: top server heartbeat loop
DEBUG: ping s1
DEBUG: ping s4
DEBUG: ping s0
DEBUG: ping s3
DEBUG: ping s2
DEBUG: ping s5
INFO: CPUINFO: -0.28738141247385707
INFO: AVG_LOG_FLUSH_CNT: -1.0
INFO: AVG_LOG_FLUSH_SZ: -1.0
INFO: BENCHMARK SUCCESS!
DEBUG: Existing Server or Client Processes:
----------
Server: node-20
-------------
sgt43     1813     1  1813  0   10 34648 20684   1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1814  0   10 34648 20684   1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1815  0   10 34648 20684   0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1816  0   10 34648 20684   1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1817  1   10 34648 20684   0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1818  1   10 34648 20684   0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1819  1   10 34648 20684   0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1820  1   10 34648 20684   0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1821  8   10 34648 20684   1 11:39 ?        00:00:01 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1813     1  1827  0   10 34648 20684   0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-20 -p 5555 -t 10 -r /users/sgt43/janus/log
sgt43     1849  1848  1849  0    1  3128  2960   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1851  1849  1851  0    1  3556   960   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-10
-------------
sgt43     1712     1  1712  0   17 74078 123360  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1715  7   17 74078 123360  0 11:39 ?        00:00:01 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1716  0   17 74078 123360  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1717  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1718  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1719  7   17 74078 123360  0 11:39 ?        00:00:01 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1720  0   17 74078 123360  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1721  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1722  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1727  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1728  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1729  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1730  0   17 74078 123360  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1731  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1732  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1733  0   17 74078 123360  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1712     1  1734  0   17 74078 123360  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-10 -p 5557 -t 10 -r /users/sgt43/janus/log
sgt43     1756  1755  1756  0    1  3128  2868   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1758  1756  1758  0    1  3556   936   0 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-11
-------------
sgt43     1821     1  1821  0   17 74078 123308  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1824  7   17 74078 123308  0 11:39 ?        00:00:01 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1825  0   17 74078 123308  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1826  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1827  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1828  6   17 74078 123308  0 11:39 ?        00:00:01 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1829  0   17 74078 123308  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1830  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1831  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1836  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1837  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1838  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1839  0   17 74078 123308  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1840  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1841  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1842  0   17 74078 123308  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1821     1  1843  0   17 74078 123308  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-11 -p 5558 -t 10 -r /users/sgt43/janus/log
sgt43     1865  1864  1865  0    1  3128  2916   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1867  1865  1867  0    1  3556  1084   0 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-1
-------------
sgt43     1746     1  1746  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1749 21   17 83551 161344  0 11:39 ?        00:00:04 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1750  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1751  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1752  1   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1753 21   17 83551 161344  0 11:39 ?        00:00:04 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1754  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1755  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1756  1   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1757 18   17 83551 161344  0 11:39 ?        00:00:03 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1762 19   17 83551 161344  0 11:39 ?        00:00:03 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1763  1   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1764  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1765  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1766  1   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1767  0   17 83551 161344  1 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1746     1  1768  0   17 83551 161344  0 11:39 ?        00:00:00 ./build/deptran_server -b -d 10 -f dsef/janus-final-dsef-y2rllvt4.yml -P node-1 -p 5556 -t 10 -r /users/sgt43/janus/log
sgt43     1790  1789  1790  0    1  3128  2960   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1792  1790  1792  0    1  3556  1028   0 11:39 ?        00:00:00 grep deptran_server

INFO: waiting for killall commands to finish.
INFO: done waiting for killall commands to finish.
DEBUG: Existing Server or Client After Kill:
----------
Server: node-20
-------------
sgt43     1904  1903  1904  0    1  3128  2920   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1906  1904  1906  0    1  3556  1080   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-10
-------------
sgt43     1810  1809  1810  0    1  3128  2968   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1812  1810  1812  0    1  3556  1084   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-11
-------------
sgt43     1920  1919  1920  0    1  3128  2860   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1922  1920  1922  0    1  3556   984   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-1
-------------
sgt43     1845  1844  1845  0    1  3128  2924   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1847  1845  1847  0    1  3556  1028   0 11:39 ?        00:00:00 grep deptran_server


 ******************** 
 *  End
 ******************** 

logging to file: /users/sgt43/janus/log/dsef-tpca_occ_multi_paxos_1_-1.log
DEBUG: logger initialized
Launch config: {'host': {'node-10': 'node-10', 'node-14': 'node-14', 'node-12': 'node-12', 'node-15': 'node-15', 'node-1': 'node-1', 'node-13': 'node-13', 'node-2': 'node-2', 'node-19': 'node-19', 'node-5': 'node-5', 'node-3': 'node-3', 'node-16': 'node-16', 'node-18': 'node-18', 'node-20': 'node-20', 'node-9': 'node-9', 'node-7': 'node-7', 'node-6': 'node-6', 'node-17': 'node-17', 'node-11': 'node-11', 'node-4': 'node-4', 'node-8': 'node-8'}, 'bench': {'dist': 'uniform', 'weight': {'write': 1, 'read': 0}, 'workload': 'tpca', 'coefficient': 1, 'scale': 1, 'population': {'teller': 100000, 'branch': 100000, 'customer': 100000}}, 'mode': {'batch': False, 'ongoing': 1, 'retry': 20, 'cc': 'occ', 'read_only': 'occ', 'ab': 'multi_paxos'}, 'process': {'c0': 'node-20', 's4': 'node-10', 's3': 'node-1', 's2': 'node-11', 's0': 'node-1', 's5': 'node-11', 's1': 'node-10'}, 'n_concurrent': 1, 'client': {'type': 'closed', 'rate': -1, 'forwarding': False}, 'sharding': {'teller': 'MOD', 'branch': 'MOD', 'customer': 'MOD'}, 'site': {'server': [['s0:10000', 's1:10001', 's2:10002'], ['s3:10003', 's4:10004', 's5:10005']], 'client': [['c0']]}, 'schema': [{'name': 'branch', 'column': [{'type': 'integer', 'primary': True, 'name': 'branch_id'}, {'type': 'integer', 'name': 'balance'}]}, {'name': 'teller', 'column': [{'type': 'integer', 'primary': True, 'name': 'teller_id'}, {'type': 'integer', 'foreign': 'branch.branch_id', 'name': 'branch_id'}]}, {'name': 'customer', 'column': [{'type': 'integer', 'primary': True, 'name': 'customer_id'}, {'type': 'integer', 'foreign': 'branch.branch_id', 'name': 'branch_id'}, {'type': 'integer', 'name': 'balance'}]}], 'args': <__main__.TrialConfig object at 0x7f8b9e37b780>}
INFO: process node-20 0 node-20:5555
INFO: process node-1 1 node-1:5556
INFO: process node-10 2 node-10:5557
INFO: process node-11 3 node-11:5558
INFO: add_site: s0, server, 10000
INFO: add_site: s1, server, 10001
INFO: add_site: s2, server, 10002
INFO: add_site: s3, server, 10003
INFO: add_site: s4, server, 10004
INFO: add_site: s5, server, 10005
INFO: add_site: c0, client, 5555
INFO: process infos: {'node-10': <__main__.ProcessInfo object at 0x7f8b9cddb080>, 'node-20': <__main__.ProcessInfo object at 0x7f8b9cddbb00>, 'node-1': <__main__.ProcessInfo object at 0x7f8b9cddba90>, 'node-11': <__main__.ProcessInfo object at 0x7f8b9cddb550>}
DEBUG: Existing Server or Client Processes:
----------
Server: node-10
-------------
sgt43     1844  1843  1844  0    1  3128  2952   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1846  1844  1846  0    1  3556   960   0 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-20
-------------
sgt43     1938  1937  1938  0    1  3128  2968   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1940  1938  1940  0    1  3556   980   0 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-1
-------------
sgt43     1879  1878  1879  0    1  3128  2944   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1881  1879  1881  0    1  3556   972   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-11
-------------
sgt43     1953  1952  1953  0    1  3128  2916   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1955  1953  1955  0    1  3556   980   0 11:39 ?        00:00:00 grep deptran_server

INFO: waiting for killall commands to finish.
ERROR: host: 1; killall did not kill anything
ERROR: host: 1; killall did not kill anything
ERROR: host: 1; killall did not kill anything
ERROR: host: 1; killall did not kill anything
INFO: done waiting for killall commands to finish.
DEBUG: Existing Server or Client After Kill:
----------
Server: node-10
-------------
sgt43     1910  1909  1910  0    1  3128  2916   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1912  1910  1912  0    1  3556   984   1 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-20
-------------
sgt43     2004  2003  2004  0    1  3128  2920   1 11:40 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     2006  2004  2006  0    1  3556   944   1 11:40 ?        00:00:00 grep deptran_server

----------
Server: node-1
-------------
sgt43     1945  1944  1945  0    1  3128  2868   0 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     1947  1945  1947  0    1  3556  1028   0 11:39 ?        00:00:00 grep deptran_server

----------
Server: node-11
-------------
sgt43     2020  2019  2020  0    1  3128  2924   1 11:39 ?        00:00:00 /bin/bash -c ps -eLF | grep "deptran_server"
sgt43     2022  2020  2022  0    1  3556   940   1 11:39 ?        00:00:00 grep deptran_server

INFO: No taskset, auto scheduling
DEBUG: before server_controller.start
DEBUG: {'node-10': <__main__.ProcessInfo object at 0x7f8b9cddb080>, 'node-20': <__main__.ProcessInfo object at 0x7f8b9cddbb00>, 'node-1': <__main__.ProcessInfo object at 0x7f8b9cddba90>, 'node-11': <__main__.ProcessInfo object at 0x7f8b9cddb550>}
INFO: starting node-10 @ node-10
INFO: starting node-20 @ node-20
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-10' -p 5557 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-10.log' 2>'/users/sgt43/janus/log/proc-node-10.err' &
INFO: starting node-1 @ node-1
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-20' -p 5555 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-20.log' 2>'/users/sgt43/janus/log/proc-node-20.err' &
INFO: starting node-11 @ node-11
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-1' -p 5556 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-1.log' 2>'/users/sgt43/janus/log/proc-node-1.err' &
DEBUG: running: cd /users/sgt43/janus;  mkdir -p /users/sgt43/janus/log;  nohup  ./build/deptran_server -b -d 10 -f 'dsef/janus-final-dsef-y2rllvt4.yml' -P 'node-11' -p 5558 -t 10 -r '/users/sgt43/janus/log' 1>'/users/sgt43/janus/log/proc-node-11.log' 2>'/users/sgt43/janus/log/proc-node-11.err' &
DEBUG: after server_controller.start
DEBUG: in setup_heartbeat
INFO: Waiting for server init ...
DEBUG: in server_heart_beat
DEBUG: in connect_rpc 1
INFO: start connect to server ctrl rpc for site s1 @ node-10:20001
E [client.cpp:139] 2017-07-17 11:40:00.872 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:00.973 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.074 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.174 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.275 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.375 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.476 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.577 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.677 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.778 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.878 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:01.979 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.080 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.180 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.281 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.381 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.482 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.583 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.683 | rrr::Client: connect(node-10:20001): Connection refused
E [client.cpp:139] 2017-07-17 11:40:02.784 | rrr::Client: connect(node-10:20001): Connection refused
D [client.cpp:144] 2017-07-17 11:40:02.884 | rrr::Client: connected to node-10:20001
INFO: Connected to site s1 @ node-10
DEBUG: in connect_rpc 4
INFO: start connect to server ctrl rpc for site s4 @ node-10:20004
D [client.cpp:144] 2017-07-17 11:40:02.986 | rrr::Client: connected to node-10:20004
INFO: Connected to site s4 @ node-10
DEBUG: in connect_rpc 0
INFO: start connect to server ctrl rpc for site s0 @ node-1:20000
D [client.cpp:144] 2017-07-17 11:40:03.088 | rrr::Client: connected to node-1:20000
INFO: Connected to site s0 @ node-1
DEBUG: in connect_rpc 3
INFO: start connect to server ctrl rpc for site s3 @ node-1:20003
D [client.cpp:144] 2017-07-17 11:40:03.190 | rrr::Client: connected to node-1:20003
INFO: Connected to site s3 @ node-1
DEBUG: in connect_rpc 2
INFO: start connect to server ctrl rpc for site s2 @ node-11:20002
D [client.cpp:144] 2017-07-17 11:40:03.291 | rrr::Client: connected to node-11:20002
INFO: Connected to site s2 @ node-11
DEBUG: in connect_rpc 5
INFO: start connect to server ctrl rpc for site s5 @ node-11:20005
D [client.cpp:144] 2017-07-17 11:40:03.393 | rrr::Client: connected to node-11:20005
INFO: Connected to site s5 @ node-11
INFO: call sync_server_ready on site 1
Segmentation fault

@sgpthomas
Copy link
Author

INFO: call sync_server_ready on site 0

Thread 18 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2c63700 (LWP 2367)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff4d26101 in rrr::PollMgr::PollThread::poll_loop (this=0x1614040) at ../rrr/rpc/polling.cpp:164
#2  0x00007ffff4d272c9 in rrr::PollMgr::PollThread::start_poll_loop (arg=<optimized out>) at ../rrr/rpc/polling.cpp:49
#3  0x00007ffff7bc16fa in start_thread (arg=0x7ffff2c63700) at pthread_create.c:333
#4  0x00007ffff78f7b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

gdb bt

@shuaimu
Copy link
Contributor

shuaimu commented Jul 17, 2017

I see. In our test setup each run.py run is only for one test (one dot in the figure). It was not written to run multiple time. I assume there will be a lot of bugs if you try to run it multiple times within the same python process. And the amount of work to fix those would be huge... So my suggestion is simply call this run.py using sub_process (and add enough sleep time between to make sure it dies), not as a function call.

@sgpthomas
Copy link
Author

ah ok, I suspected that might be the case. thanks for looking at it

@sgpthomas
Copy link
Author

my hope was that I could minimize the amount of data serialization that needed to be done, and keep the data as python dictionaries as long as possible

@lamontnelson
Copy link
Collaborator

I agree with Shuai that it may not be worth the effort, but another route to take could be to force the simplerpc module to re-initialize itself by shutting down, removing the module, and importing again. However, I have not tried this method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants