Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bidijkstra stall-on-demand #18

Closed
Stunkymonkey opened this issue Apr 7, 2020 · 21 comments
Closed

bidijkstra stall-on-demand #18

Stunkymonkey opened this issue Apr 7, 2020 · 21 comments

Comments

@Stunkymonkey
Copy link
Contributor

I am not sure if I understand stall-on-demand fully.

this one will improve query time for sure 😉
i have seen @easbar implemented it for graphhopper, so i guess you understood everything.

please check my understanding:
the blocking happens before the neighboring nodes are added to the heap. Because adding nodes to the heap is the expensive part.

so we iterate over incoming edges with its neighbors and check if a node with a higher level is already settled. if so we end here.?
if not we iterate as usual over outgoing edges with its neighbors and add them to the heap.

do we have to store something when a node is blocked?

@Stunkymonkey
Copy link
Contributor Author

there is something missing:
if a node with a higher level is already settled and the way via the node with the higher level is shorter

@easbar
Copy link
Owner

easbar commented Apr 7, 2020

Yes the idea is not to expand nodes that are known to be settled even though they were not reached via the shortest path. I tried this here: 94c01eb but to my surprise I did not notice any speed up yet. Maybe I did something wrong. The first thing that should be checked (and I did not do this yet) is if the optimization reduces the number of settled nodes. If it does not something is wrong. If it does but the query speed is not improved I guess either the map is too small to benefit from stall on demand or already doing this simple check is too costly such that it is not worth it even though the number of settled nodes is reduced.

@Stunkymonkey
Copy link
Contributor Author

but saving that the node was blocked before is not helpful in the future?

@Stunkymonkey
Copy link
Contributor Author

Stunkymonkey commented Apr 7, 2020

I will try some things with the OSM-graph of germany.

@easbar
Copy link
Owner

easbar commented Apr 7, 2020

but saving that the node was blocked before is not helpful in the future?

Whenever we pull a node from the heap it should either be expanded/settled or if it is blocked be discarded, but every node should be processed only once. If thats not the case this might be the/a problem (I thought that does not happen, but maybe I missed this)

I will try some things with the OSM-graph of germany.

Cool, thanks!

@Stunkymonkey
Copy link
Contributor Author

okay in my case the number of visited nodes is always the same with/without stalling

@Stunkymonkey
Copy link
Contributor Author

also i am not seeing any check for the node-level

@Stunkymonkey
Copy link
Contributor Author

oh wait my implementation if not correct. I tried adapting it to my case.

@Stunkymonkey
Copy link
Contributor Author

still the number of visited nodes is always the same

@easbar
Copy link
Owner

easbar commented Apr 7, 2020

still the number of visited nodes is always the same

Well thats good :) this means when we find out why there is still hope for improvements this way.
I might have a look when I find the time.

@Stunkymonkey
Copy link
Contributor Author

my implementation: Stunkymonkey/osm_ch@ba94b4c

@Stunkymonkey
Copy link
Contributor Author

Stunkymonkey commented Apr 14, 2020

i fixed my implementation and was getting a speedup:
the overall changes are here

i will hopefully check your implementation later this evening

@easbar
Copy link
Owner

easbar commented Apr 14, 2020

Cool, can you check what happens if you leave out the nodes[way.target].rank > nodes[node].rank condition (in your code)? And how big is the speedup you saw in your implementation?

@Stunkymonkey
Copy link
Contributor Author

the speedup for a single can be around 2.5, but also can be a little worse.
I have not averaged them. I will test this in your implementation.

I feel like local querys are slower, but querys with big distance are getting a better speedup. This is not verified yet, and just a feeling.

I will check later if the rank has to be checked.

@easbar
Copy link
Owner

easbar commented Apr 14, 2020

the speedup for a single can be around 2.5

Yes that matches my previous experience with this optimization

I feel like local querys are slower, but querys with big distance are getting a better speedup.

Yes that also makes sense

@Stunkymonkey
Copy link
Contributor Author

Cool, can you check what happens if you leave out the nodes[way.target].rank > nodes[node].rank condition (in your code)?

If i leave it out no path is found

@Stunkymonkey
Copy link
Contributor Author

#19 still no speedup 😒

@easbar easbar mentioned this issue Aug 30, 2020
@easbar
Copy link
Owner

easbar commented Aug 30, 2020

I added a performance test for a larger map (California+Nevada) and some counters to see the number of polled nodes here: https://github.com/easbar/fast_paths/compare/stall_on_demand
The test can be run with the command:

export RUST_TEST_THREADS=1; cargo test run_performance_test_time_cali --release -- --ignored --nocapture

Running this with and without the stall-on-demand optimization I got:

without the optimization

test tests::run_performance_test_time_cali ... Running performance test for California&Nevada time
number of nodes (input graph) ..... 1890816
number of edges (input graph) ..... 4630444
preparation time .................. 59681 ms
number of nodes (fast graph) ...... 1890816
number of out-edges (fast graph) .. 3973185
number of in-edges  (fast graph) .. 3973464
total query time .................. 18606 ms
query time on average ............. 186 micros
fwd-polls ..... 16682887
bwd-polls ..... 16691229
ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 42 filtered out

with the optimization

test tests::run_performance_test_time_cali ... Running performance test for California&Nevada time
number of nodes (input graph) ..... 1890816
number of edges (input graph) ..... 4630444
preparation time .................. 62399 ms
number of nodes (fast graph) ...... 1890816
number of out-edges (fast graph) .. 3973185
number of in-edges  (fast graph) .. 3973464
total query time .................. 15889 ms
query time on average ............. 158 micros
fwd-polls ..... 10481019
bwd-polls ..... 10478056
ok

So the number of polls is reduced by about 30% and the average query time is reduced from 186 to 158µs (-15%) in this test. So not bad, but maybe also not what we were hoping for.

@easbar
Copy link
Owner

easbar commented Aug 30, 2020

Btw the Dimacs graphs can be found here: http://www.dis.uniroma1.it/~challenge9/, but unfortunately it seems they are no longer available (or at least right now not available) as they created a new website (?)

@easbar
Copy link
Owner

easbar commented Aug 30, 2020

Doing the same for the distance metric:

export RUST_TEST_THREADS=1; cargo test run_performance_test_dist_cali --release -- --ignored --nocapture

yields better results:

without the optimization

test tests::run_performance_test_dist_cali ... Running performance test for California&Nevada dist
number of nodes (input graph) ..... 1890816
number of edges (input graph) ..... 4630444
preparation time .................. 96429 ms
number of nodes (fast graph) ...... 1890816
number of out-edges (fast graph) .. 4145281
number of in-edges  (fast graph) .. 4145308
total query time .................. 37147 ms
query time on average ............. 371 micros
fwd-polls ..... 61221172
bwd-polls ..... 61285180
ok

with the optimization

test tests::run_performance_test_dist_cali ... Running performance test for California&Nevada dist
number of nodes (input graph) ..... 1890816
number of edges (input graph) ..... 4630444
preparation time .................. 99619 ms
number of nodes (fast graph) ...... 1890816
number of out-edges (fast graph) .. 4145281
number of in-edges  (fast graph) .. 4145308
total query time .................. 27168 ms
query time on average ............. 271 micros
fwd-polls ..... 27585522
bwd-polls ..... 27512518
ok

The number of polls is reduced by about 55% and the query time goes down about 30%.

@easbar
Copy link
Owner

easbar commented Aug 30, 2020

I commited the stall on demand optimization to master here: d13c5e1
It showed a significant speed up in all benchmarks I ran.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants