Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpcc benchmark fails test #5

Open
sgpthomas opened this issue Jun 8, 2017 · 6 comments
Open

tpcc benchmark fails test #5

sgpthomas opened this issue Jun 8, 2017 · 6 comments

Comments

@sgpthomas
Copy link

This is likely a user error. I would appreciate any help pointing me in the write direction. Specifically, when running test_run.py with none and using a benchmark of tpcc, the test fails with the following error:

I [s_main.cc:122] 2017-06-08 13:18:41.709 | PWD : 
I [s_main.cc:134] 2017-06-08 13:18:41.709 | starting process 4511
I [config.cc:268] 2017-06-08 13:18:41.710 | LoadYML: config/none.yml
I [config.cc:268] 2017-06-08 13:18:41.711 | LoadYML: config/1c1s1p.yml
I [config.cc:383] 2017-06-08 13:18:41.712 | BuildSiteProcMap
I [config.cc:268] 2017-06-08 13:18:41.712 | LoadYML: config/tpcc.yml
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [config.cc:632] 2017-06-08 13:18:41.722 | group size: 1
I [s_main.cc:26] 2017-06-08 13:18:41.723 | client_setup_heartbeat
I [s_main.cc:69] 2017-06-08 13:18:41.723 | server enabled, number of sites: 1
I [s_main.cc:94] 2017-06-08 13:18:41.724 | waiting for client setup threads.
I [s_main.cc:77] 2017-06-08 13:18:41.724 | launching site: 0, bind address 0.0.0.0:8101
I [server_worker.cc:78] 2017-06-08 13:18:41.731 | start data population for site 0
I [server_worker.cc:106] 2017-06-08 13:18:47.784 | data populated for site: 0, partition: 0
I [s_main.cc:85] 2017-06-08 13:18:47.784 | table popped for site 0
I [server_worker.cc:122] 2017-06-08 13:18:47.784 | enter SetupService for s101 @ 0.0.0.0:8101
I [server.cpp:434] 2017-06-08 13:18:47.784 | rrr::Server: started on 0.0.0.0:8101
I [server_worker.cc:173] 2017-06-08 13:18:47.784 | Server s101 ready at 0.0.0.0:8101
I [s_main.cc:88] 2017-06-08 13:18:47.784 | start communication for site 0
I [communicator.cc:157] 2017-06-08 13:18:47.784 | connect to site: 127.0.0.1:8101 (attempt 0)
I [s_main.cc:90] 2017-06-08 13:18:47.784 | site 0 launched!
I [s_main.cc:98] 2017-06-08 13:18:47.784 | done waiting for client setup threads.
I [s_main.cc:105] 2017-06-08 13:18:47.784 | server workers' communicators setup
I [s_main.cc:51] 2017-06-08 13:18:47.785 | client enabled, number of sites: 1
I [communicator.cc:157] 2017-06-08 13:18:47.785 | connect to site: 127.0.0.1:8101 (attempt 0)
I [s_main.cc:126] 2017-06-08 13:18:47.785 | wait_for_clients: wait for client threads to exit.
I [communicator.cc:69] 2017-06-08 13:18:47.792 | Done waiting to connect to client leaders.
I [client_worker.cc:137] 2017-06-08 13:18:47.792 | closed loop clients.
I [client_worker.cc:115] 2017-06-08 13:18:47.792 | coordinator 65536 created at site 1: forward 0
I [client_worker.cc:224] 2017-06-08 13:18:47.792 | DispatchRequest: 1
I [coord.cc:82] 2017-06-08 13:18:47.797 | start txn!!! : 0
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
@shuaimu
Copy link
Contributor

shuaimu commented Jun 8, 2017

Can you give the command line you are using?

@sgpthomas
Copy link
Author

running with ./test_run.py -m none -s 1c1s1p -b tpcc. I get the same result changing the site.

sgt43@node-0:~/janus$ ./test_run.py -m none -s 1c1s1p -b tpcc
mode           site      bench     concurrent     result 	 time 
none           1c1s1p    tpcc      concurrent_1   Failed 	 4.99s
none           1c1s1p    tpcc      concurrent_10  Failed 	 5.04s
none           1c1s1p    tpcc      concurrent_100 Failed 	 4.99s
sgt43@node-0:~/janus$ cat none-1c1s1p-tpcc.res 
I [s_main.cc:122] 2017-06-08 14:09:52.247 | PWD : 
I [s_main.cc:134] 2017-06-08 14:09:52.247 | starting process 10256
I [config.cc:268] 2017-06-08 14:09:52.247 | LoadYML: config/none.yml
I [config.cc:268] 2017-06-08 14:09:52.249 | LoadYML: config/1c1s1p.yml
I [config.cc:383] 2017-06-08 14:09:52.250 | BuildSiteProcMap
I [config.cc:268] 2017-06-08 14:09:52.250 | LoadYML: config/tpcc.yml
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [config.cc:632] 2017-06-08 14:09:52.260 | group size: 1
I [s_main.cc:26] 2017-06-08 14:09:52.261 | client_setup_heartbeat
I [s_main.cc:69] 2017-06-08 14:09:52.261 | server enabled, number of sites: 1
I [s_main.cc:94] 2017-06-08 14:09:52.261 | waiting for client setup threads.
I [s_main.cc:77] 2017-06-08 14:09:52.261 | launching site: 0, bind address 0.0.0.0:8101
I [server_worker.cc:78] 2017-06-08 14:09:52.268 | start data population for site 0
I [server_worker.cc:106] 2017-06-08 14:09:57.110 | data populated for site: 0, partition: 0
I [s_main.cc:85] 2017-06-08 14:09:57.110 | table popped for site 0
I [server_worker.cc:122] 2017-06-08 14:09:57.110 | enter SetupService for s101 @ 0.0.0.0:8101
I [server.cpp:434] 2017-06-08 14:09:57.110 | rrr::Server: started on 0.0.0.0:8101
I [server_worker.cc:173] 2017-06-08 14:09:57.110 | Server s101 ready at 0.0.0.0:8101
I [s_main.cc:88] 2017-06-08 14:09:57.110 | start communication for site 0
I [communicator.cc:157] 2017-06-08 14:09:57.110 | connect to site: 127.0.0.1:8101 (attempt 0)
I [s_main.cc:90] 2017-06-08 14:09:57.110 | site 0 launched!
I [s_main.cc:98] 2017-06-08 14:09:57.110 | done waiting for client setup threads.
I [s_main.cc:105] 2017-06-08 14:09:57.110 | server workers' communicators setup
I [s_main.cc:51] 2017-06-08 14:09:57.110 | client enabled, number of sites: 1
I [communicator.cc:157] 2017-06-08 14:09:57.111 | connect to site: 127.0.0.1:8101 (attempt 0)
I [s_main.cc:126] 2017-06-08 14:09:57.111 | wait_for_clients: wait for client threads to exit.
I [communicator.cc:69] 2017-06-08 14:09:57.118 | Done waiting to connect to client leaders.
I [client_worker.cc:137] 2017-06-08 14:09:57.118 | closed loop clients.
I [client_worker.cc:115] 2017-06-08 14:09:57.118 | coordinator 65536 created at site 1: forward 0
I [client_worker.cc:224] 2017-06-08 14:09:57.118 | DispatchRequest: 1
I [coord.cc:82] 2017-06-08 14:09:57.118 | start txn!!! : 0
I [client_worker.cc:224] 2017-06-08 14:09:57.119 | DispatchRequest: 1
I [coord.cc:82] 2017-06-08 14:09:57.120 | start txn!!! : 0
I [client_worker.cc:224] 2017-06-08 14:09:57.120 | DispatchRequest: 1
I [coord.cc:82] 2017-06-08 14:09:57.120 | start txn!!! : 0
I [client_worker.cc:224] 2017-06-08 14:09:57.121 | DispatchRequest: 1
I [coord.cc:82] 2017-06-08 14:09:57.126 | start txn!!! : 0
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at

@shuaimu
Copy link
Contributor

shuaimu commented Jun 8, 2017

just to confirm, does other bench/concurrency combinations run?

@sgpthomas
Copy link
Author

yes, tapir works with all benchmarks, tpl_ww works with everything but tpcc and tpcc_no I'm running other combinations now, I'll update you with a full list of what works and what doesn't when this finishes

@sgpthomas
Copy link
Author

full list

sgt43@node-0:~/janus$ ./test_run.py -s 1c1s1p
mode           site      bench     concurrent     result 	 time 
none           1c1s1p    rw        concurrent_10  OK     	 24.75s
none           1c1s1p    tpca      concurrent_10  OK     	 21.33s
none           1c1s1p    tpcc      concurrent_10  Failed 	 5.04s
none           1c1s1p    tpcc_no   concurrent_10  Failed 	 4.99s
none           1c1s1p    tpcc_pm   concurrent_10  OK     	 26.45s
none           1c1s1p    tpccd     concurrent_10  OK     	 25.55s
tpl_ww         1c1s1p    rw        concurrent_10  OK     	 23.44s
tpl_ww         1c1s1p    tpca      concurrent_10  OK     	 21.08s
tpl_ww         1c1s1p    tpcc      concurrent_10  Failed 	 4.89s
tpl_ww         1c1s1p    tpcc_no   concurrent_10  Failed 	 4.68s
tpl_ww         1c1s1p    tpcc_pm   concurrent_10  OK     	 25.05s
tpl_ww         1c1s1p    tpccd     concurrent_10  OK     	 25.52s
occ            1c1s1p    rw        concurrent_10  OK     	 23.94s
occ            1c1s1p    tpca      concurrent_10  OK     	 20.83s
occ            1c1s1p    tpcc      concurrent_10  Failed 	 6.19s
occ            1c1s1p    tpcc_no   concurrent_10  Failed 	 6.19s
occ            1c1s1p    tpcc_pm   concurrent_10  OK     	 25.10s
occ            1c1s1p    tpccd     concurrent_10  OK     	 25.57s
tpl_ww_paxos   1c1s1p    rw        concurrent_10  Failed 	 4.49s
tpl_ww_paxos   1c1s1p    tpca      concurrent_10  Failed 	 1.78s
tpl_ww_paxos   1c1s1p    tpcc      concurrent_10  Failed 	 6.24s
tpl_ww_paxos   1c1s1p    tpcc_no   concurrent_10  Failed 	 4.84s
tpl_ww_paxos   1c1s1p    tpcc_pm   concurrent_10  Failed 	 6.14s
tpl_ww_paxos   1c1s1p    tpccd     concurrent_10  Failed 	 7.65s
occ_paxos      1c1s1p    rw        concurrent_10  Failed 	 4.94s
occ_paxos      1c1s1p    tpca      concurrent_10  Failed 	 1.88s
occ_paxos      1c1s1p    tpcc      concurrent_10  Failed 	 5.04s
occ_paxos      1c1s1p    tpcc_no   concurrent_10  Failed 	 5.04s
occ_paxos      1c1s1p    tpcc_pm   concurrent_10  Failed 	 6.34s
occ_paxos      1c1s1p    tpccd     concurrent_10  Failed 	 7.75s
tapir          1c1s1p    rw        concurrent_10  OK     	 24.44s
tapir          1c1s1p    tpca      concurrent_10  OK     	 21.38s
tapir          1c1s1p    tpcc      concurrent_10  OK     	 24.90s
tapir          1c1s1p    tpcc_no   concurrent_10  OK     	 25.06s
tapir          1c1s1p    tpcc_pm   concurrent_10  OK     	 25.25s
tapir          1c1s1p    tpccd     concurrent_10  Failed 	 7.75s
rcc            1c1s1p    rw        concurrent_10  Failed 	 4.79s
rcc            1c1s1p    tpca      concurrent_10  Failed 	 2.16s
rcc            1c1s1p    tpcc      concurrent_10  Failed 	 5.79s
rcc            1c1s1p    tpcc_no   concurrent_10  Failed 	 5.84s
rcc            1c1s1p    tpcc_pm   concurrent_10  Failed 	 5.79s
rcc            1c1s1p    tpccd     concurrent_10  Failed 	 6.39s
janus          1c1s1p    rw        concurrent_10  Failed 	 4.49s
janus          1c1s1p    tpca      concurrent_10  Failed 	 1.88s
janus          1c1s1p    tpcc      concurrent_10  Failed 	 5.56s
janus          1c1s1p    tpcc_no   concurrent_10  Failed 	 5.54s
janus          1c1s1p    tpcc_pm   concurrent_10  Failed 	 5.49s
janus          1c1s1p    tpccd     concurrent_10  Failed 	 6.99s

@shuaimu
Copy link
Contributor

shuaimu commented Jun 8, 2017

Thanks! I am able to repeat. I will look into why.

The rcc/tpccd are deprecated code and thus should be removed from the script. The rest definitely requires a fix. "none" is a special issue, it represents "no concurrency control", perhaps it crashed for a reason. I will look into it.

In the meantime if you want a version that runs, feel free to rollback to commit f45fd04. I just checked it runs okay. (janus is aliased as "brq" then in the script)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants