-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the O(n^3) performance issue in is_connected
utility function
#397
base: develop
Are you sure you want to change the base?
Conversation
Adhere to the specification in comment: // Is the undirected graph connected? // Is the directed graph strongly connected? Instead of checking if each vertex is reachable from each vertex (which is O(n^3)), consider different approaches for directed and undirected graphs: For an undirected graph check if each vertex was reachable with a single DFS walk. This runs in O(N) (modulo back edges). For a directed graph run Tarjan SCC algorithm and check if all vertices end up in a single component. This runs in O(N+E). The speed-up is considerable, e.g. for a 25x25 square connected graph it is about: time: undirected, connected - prev implementation - elapsed 16.458s time: undirected, connected - new implementation - elapsed 6.1249e-05s time: directed, connected - prev implementation - elapsed 7.45684s time: directed, connected - new implementation - elapsed 4.9894e-05s For similar 80x80 graph it's about 0.000563113s, while it just takes forever with the previous O(n^3) approach.
Adhere to the specification in function comment: // Is the undirected graph connected? // Is the directed graph strongly connected? Instead of checking if each vertex is reachable from each vertex (which is O(n^3)), consider different approaches for directed and undirected graphs: For an undirected graph check if each vertex was reachable with a single DFS walk. This runs in O(N) of a single connected component (modulo its back edges). For a directed graph run Tarjan SCC algorithm and check if all vertices end up in a single component. This runs in O(N + E). The speed-up is considerable, e.g. for a 25x25 square connected graph it is about: time: undirected, connected - prev implementation - elapsed 16.458s time: undirected, connected - new implementation - elapsed 6.1249e-05s time: directed, connected - prev implementation - elapsed 7.45684s time: directed, connected - new implementation - elapsed 4.9894e-05s For similar 80x80 graph it's about 0.000563113s, while it just takes forever with the previous O(n^3) approach.
…backward compatibility with the previous implementation
I've also opened #398 to track this. |
This resolves a build error due to boost::detail::get preferred without the using declaration: ../../../boost/pending/detail/disjoint_sets.hpp:58:24: error: no matching function for call to 'get(long unsigned int*&, int&)' 58 | assert(i == get(p, i)); ../../../boost/graph/reverse_graph.hpp:561:7: note: candidate: 'template<class E> E boost::detail::get(boost::detail::underlying_edge_desc_map_type<E>, const boost::detail::reverse_graph_edge_descriptor<EdgeDesc>&)' 561 | E get(underlying_edge_desc_map_type< E > m, | ^~~ ../../../boost/graph/reverse_graph.hpp:561:7: note: template argument deduction/substitution failed: ../../../boost/pending/detail/disjoint_sets.hpp:58:24: note: mismatched types 'boost::detail::underlying_edge_desc_map_type<E>' and 'long unsigned int*' 58 | assert(i == get(p, i)); | ~~~^~~~~~
I only have one question: how does someone with almost no activity history in GitHub know so much about Boost.Graph? :) |
A great question! Well, I read the book as soon at it was published ;) And Alexander Stepanov's recommendation meant a lot to me, I know him personally and appreciate his inventions a lot. I also ended up rolling up my own graph library before, because Boost was not welcome in my organization. As of GitHub, I hope to publish a few things soon, including the small project, where I encountered this error. One reason I actually hit it is that I go straight to the code first and resort to the documentation when the code doesn't work as I'd expect. Thanks for looking into this! |
For the record, this function was added in 86ef707 on |
Interesting. Did you also work with him? I'm also a great admirer of his work and managed to visit him once for a chat; unfortunately, Palo Alto is a long way from Australia. Anyway, my question was fairly flippant, so let me address an actual concern: maybe this function shouldn't exist at all? As you pointed out, the function is undocumented and not used anywhere in the code base that I can see, so we are free to reinvent it. |
No, we also only met for a few days.
Yes, I like this idea. Let me draft an implementation do discuss in greater details. I'd still consider it important to keep the interface backward-compatible. What do you think about it? Also is it possible to mark a function as deprecated in Boost? I only see the deprecated headers or comments for function deprecation. |
And one more question: I see |
is_connected.hpp Implement 3 connection modes for directed graphs: strong, unilateral and weak
Remove the obsolete `run_test` invocation.
Hi I updated this FR with your suggestions, please have a look. One can now call the algorithm like this:
which all returns the same for an undirected graph (this makes sense conceptually). Does this correspond to your direction of thought? If yes, I'll add some documentation. I think there are 3 questions at hand, I made some reasonable trade offs here, but would like to discuss this more. Probably it is good enough, but sure there is some place for improvement.
While working on this I spotted a few other things; I'll probably send separate PRs for them to keep this one focused on is_connected. I look forward to your comments and suggestions on the subject. Many thanks for looking into this! |
Apologies for the delay in replying, I'm in the midst of relocating to a different country.
I know backward-compatibility is often sought after or required, but I think it's given too much priority without consideration of the value provided. In this case, as I mentioned:
so in my opinion, backward-compatibility is not a priority. I think it's better to draft an interface first before moving on to code, but I'm too late with that advice.
The relationship between the kinds of directed connectivity is that strongly ⇒ weakly and strongly ⇒ unilaterally, but nothing else (i.e. unilaterally does not ⇒ weakly nor vice versa). So I wouldn't use the struct/enum design you currently have, I would do
I believe that's how the graph concepts are done too, have a look around in
https://en.cppreference.com/w/cpp/language/attributes/deprecated |
Ah, I see we both wrote long comments at the same time; I didn't see yours until after I finished mine. |
Hmmm, my idea of using the template parameter is not essential, btw. Using the second parameter for a tag like you have done is probably preferable and I think follows existing conventions. One benefit of the tag dispatch is that |
Keep it not required for undirected graphs. Also rename is_connected_kind to connected_kind
I wish you very smooth relocation, definitely not an easy thing to do ;)
Yes, I ended up implementing this behaviour even before reading your comment! I agree, that explicit is better than implicit even when it requires a bit more typing. Please see the recent version. Thanks for commenting on the backward compatibility. I think a static assert with a reasonable message would be good enough for now. Also thanks for confirming the kind selection tags strategy, it's probably good enough for now. Overall I think this code is now in a reasonably good shape to be useful already. I will look into selecting the proper output property maps and picking associative ones when the vertex_index is not present in the graph. I think this would be more usable than providing an external index and still reasonably fast. This can also follow as a different PR. |
The
is_connected()
function is implemented in terms ofis_rechable()
, effectively checking if each vertex is reachable from each other vertex in a nested loop. This quickly becomes terribly slow as the number of vertices and edges grow, becoming roughly unusable with a square connected graph of size 20x20.This request fixes the issue by
This reduces the algorithm complexity from cubic to linear and makes it possible to use it on large graphs. See also the commit message in 80ae041
This change is fully backward compatible with the previous implementation.
It also adds a new
is_connected.cpp
test. Feel free to run this test on the previous implementation. This test passes without the new algorithm, but picks the size of 20x20 square to be considerably slow. This test also accepts an integer as argv[1], the size of the square, so it is possible to observe the slow-down growing as the size increases (the default is 20).Please see an excerpt of my measurement below (I can upload this program as well). The previous implementation quickly becomes impracticably slow.
I note, that
is_connected
is not documented and is not widely used. Still I consider it to be important to fully adhere to the interface specified in comment and preserve the compatibility, as it could have been used in non-public projects. In short one important decision here is to avoid any breaking changes.I foresee a few concerns with this request, which I'll be happy to discuss and address if possible:
graph_utility.hpp
header file seems to be designed as low-dependency. Including < vector > and <strong_components.hpp> there seems to violate that choice. But I think the practical aspect of this is reasonably minimal; while fixing the issue fully preserving the backward compatibility is important in practice.strong_components
do not accept the colormap and pass it to DFS. I don't see this as a big concern. This can be addressed in 2 ways: updating SCC interface to pass the colormap to DFS, or (probably better) provide the version ofis_connected
with a single argument and no colormap.VertexIndexGraph
concept. (I think) This can be fixed by usingmap
instead ofvector
for the component map (comp_map
) in case the graph is not VertexIndexGraph. I'm not sure, how important is this, but this seems to be a deviation from the full backward compatibility.I consider a bit of follow-up work on this, which can also be done in a separate pull request:
is_connected(const Graph& g)
. This will require including the colormap header into the graph_utilityis_connected
(and other utility functions likeis_reachable
)fixes #398
Measurement: