Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement graph components and archetypes #7500

Draft
wants to merge 81 commits into
base: main
Choose a base branch
from

Conversation

grtlr
Copy link
Contributor

@grtlr grtlr commented Sep 24, 2024

This implements basic graph primitives in the Rerun data model. This is a first step towards visualizing node-link diagrams in Rerun (related issue: #6898).

In addition to the changes to the data model, this PR adds two example crates, node_link_graph and graph_view to the Rust examples that show how these primitives can be used.

Design Decisions

  • Nodes and edges are stored as components and can be batched. To have a single node per entity we can use Rerun’s [clamping mechanism](https://rerun.io/docs/concepts/batches#component-clamping).
  • GraphNodeId is modeled as u32 to improve performance when using petgraph strings for better user experience.
  • A node is unique identified by combining its GraphNodeId and its EntityPath.
  • Labels of the nodes can be set via the labels component and toggled via show_labels
  • Hierarchical graphs can be modeled through entity paths. For edges that cross entity boundaries we can insert dummy nodes to properly render subparts of the hierarchy.
  • Nodes and edges need to be logged to different entities, otherwise the selections will collide. We provider helper functions / conversions to link nodes that are stored in different entities.

Logging example

rec.set_time_sequence("frame", 2);
rec.log("living/objects", &GraphNodes::new(["table"]))?;
rec.log("living/areas", &GraphNodes::new(["area0", "area1", "area2"]))?;

rec.log(
    "living/edges",
    &GraphEdgesDirected::new([
        // Both source and target are in the same entity
        ("living/areas", "area0", "area1"),
        ("living/areas", "area0", "area2"),
        ("living/areas", "area1", "area2"),
    ]),
)?;

rec.log(
    "reachable",
    &GraphEdgesUndirected::new([
        // Source and target are in different entities.
        (("living/areas", "area1"), ("living/objects", "table")),
    ]),
)?;

TODOs

  • Get rid of the Default derive for GraphNodeId and GraphEdge in the flatbuffer definitions.
  • Improve ergonomics for generating graph edges during logging.
  • Ensure that logging works from Python and C++ too.
  • Fix remaining lints.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@nikolausWest
Copy link
Member

In this design, are the node id's global?

"attr.rust.custom_clause":
'cfg_attr(feature = "serde", derive(::serde::Serialize, ::serde::Deserialize))'
) {
edge: [rerun.datatypes.GraphNodeId: 2] (order: 100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any practical use (in robotics, primarily) in support hypergraphs (where an edge can join multiple nodes)?

Copy link
Contributor Author

@grtlr grtlr Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm 🤔, I'm not aware of any practical applications that use hypergraphs. There also does not seem to be any support in the petgraph crate, so maybe we should focus on regular graphs first. If we think hypergraphs will become important, we can probably add a HyperGraphEdge component to remain backwards compatible.

Of course, an hypergraph could also be "normalized" by adding virtual nodes for the edges that join multiple nodes.

@grtlr
Copy link
Contributor Author

grtlr commented Sep 24, 2024

@nikolausWest Good point! So far I've made the assumption that all node IDs are global and can be referenced across entities to allow edges that connect nodes from different entities (similar to ClassId).

We could probably use a namespaced approach (e.g. based on entity path) by changing the way we gather the nodes from the entities that are currently being visualized. Then, we would have to store this entity information in the edges though.

However, it's probably simpler for the user to encode this information in the node IDs themselves. In the case of String IDs, this could be done by appending the entity path as a suffix. If we stick with IDs based on integers users could for example use factorizations.

Do you have a particular use case for namespaced node IDs in mind? How would you expect Rerun to behave in that case?

@nikolausWest
Copy link
Member

So far I've made the assumption that all node IDs are global and can be referenced across entities to allow edges that connect nodes from different entities (similar to ClassId).

Class Ids aren't actually global in the sense that to resolve a class id into e.g. a color, we walk up the graph to the first Annotation Context and use that to look up the value. That means these are actually scoped.

Do you have a particular use case for namespaced node IDs in mind? How would you expect Rerun to behave in that case?

I'm rather thinking about what happens if a user has multiple graphs. It would be a bit strange if edges started going between them because the user didn't correctly partition their node ids.

@nikolausWest
Copy link
Member

One option to consider here could be to have two types of edges. One that is within-entity edges (just a pair of node ids). The other could be between entity edges (destination entity, optional(source id), optional(destination id)). There might be several ways to model that but that would at least be the general idea

@grtlr
Copy link
Contributor Author

grtlr commented Sep 25, 2024

@nikolausWest Brief update:

Edges now have two additional attributes source_entity and target_entity as you described above. That way we can:

  1. Make edges local to an entity by default to avoid collisions.
  2. Allow linking between nodes of different entities.

The following shows a new toy example of the new logging API, which I also streamlined: https://github.com/rerun-io/rerun/pull/7500/files#diff-d054c306b388fcc1e8daf9f0477735519df3eeb486030979c62478b5d43404dcR36-R66.

I've also improved the debug graph viewer to show lists of nodes, edges, and their corresponding entity path. I'm currently working on getting overrides in the UI to work so that we can color nodes and edges to more easily understand the data model. This also helps me better understand the visualizer concepts.

I have one outstanding design decision:

The current systems allows for edges to live in completely unrelated parts of the entity hierarchy. This means that when the user choses to visualize a certain sub-entity not all edges are retrieved by default and the user has to manually specify the additional entities that contain the edges using Entity Path Filters:

+ /doors/**                <- contains the global edges
+ /hallway/areas/**        <- the actual entity that the user wants to visualize

Since this can be confusing due to the lack of discoverability, I think we should pull in global edges from outside the current hierarchy and visualizing them as dummy nodes. In the current design this forces us to iterate over all edges starting from the root. Should we mitigate this by introducing a new GraphCrossEntityEdge component, similar to what you described above?

@nikolausWest
Copy link
Member

@grtlr I actually think having edges on different entities than nodes is the main reason to go with your proposal. That allows users to put different meta data on edges and nodes which is a very important feature. If you can think of a way to still keep it more local that might be worth while.

Since this can be confusing due to the lack of discoverability, I think we should pull in global edges from outside the current hierarchy and visualizing them as dummy nodes. In the current design this forces us to iterate over all edges starting from the root.

I'm not completely sure about this. Maybe @Wumpf has some thoughts?

@Wumpf
Copy link
Member

Wumpf commented Sep 26, 2024

Haven't been following the entire discussion, but there's a lot of issues with pulling data from outside of a viewer's query/entity-pathfilter:

  • changes the query mechanism from going through a higher level abstraction to direct store queries
    • we actually want to get to a place where we can predict the query of a view perfectly just from looking at how its configured, not knowing its specific type
    • we're quite far from that and there's existing violations of that rule, 3d transforms being the most prominent one
  • blueprint can't be applied to everything outside the path filter
    • blueprint overrides have no effect
    • blueprint defaults on the View have no effect
  • if we pull data outside of the path filter, how to stop certain data from being ingested?
    • control for that is blueprint
    • does this cause a huge query?
    • how do I display independent graphs in independent views?

@nikolausWest
Copy link
Member

I think it's pretty clear we shouldn't include data from outside the entity path query. I think one question that leaves though is how to handle edges that are within the included entities but refer to nodes that are not. Perhaps some kind of greyed out nodes could make sense there (maybe even an option to include or exclude those)

@grtlr
Copy link
Contributor Author

grtlr commented Sep 26, 2024

Thank you @Wumpf for the clarification—that makes a lot of sense!

@nikolausWest I agree that we should show those as edges to some dummy nodes. In fact this is how I stumbled upon the above problem in the first place (undirected edges).

@grtlr
Copy link
Contributor Author

grtlr commented Oct 14, 2024

@emilk, @Wumpf: @nikolausWest asked me to post this here to open up the discussion:

I wanted to sync with you to map out the next steps of the graph implementation. It would be awesome if we could settle on a definition-of-done (DoD) for the initial version. I'm happy to add features and improve my work going forward, but I think having a milestone to work against will help a lot.

There are several areas that I've worked on and that will require future work:

Data model

Overall I think the data model is finished. Becaue of the use of flatbuffers we can easily add fields like edge and node weights in the future. There are currently some fields in the model like position that are not used, I would simply remove them before we merge the changes (depending on the DoD). If the current design is approved, I would start writing some helper functions for the Python and C++ logging libraries.

Graph Viewer

The basic graph viewer is working correctly. It has the following features:

  • Dragging of nodes
  • Can show directed/undirected edges (for now only straight lines)
  • Highlights nodes and selections
  • Toggle between node labels on/off.
  • Shows dummy nodes (nodes that are part of edges that live outside of the current selected entity hierarchy)

The code for the graph viewer still requires quite a bit of cleanup as I was still experimenting with the different layout algorithms, but I think I have a pretty good idea on what needs to be done—and I'm currently working on that.

Graph layouts

This is the biggest outstanding area, as the existing crates in the Rust ecosystem are all lacking—I implemented layout providers for all apart from graphviz-rust:

  • layout-rs (GraphViz-like layouts) does only expose the node positions and not the control points for the edges. We would have to fork this project.
  • graphviz-rust requires the GraphViz CLI to be present, which adds complexity to the build for native releases, and would require us to mainain a separate version for the web viewer. There is a WASM version of GraphViz but I still think this solution would have too many moving parts.
  • fdg and fdg-sim (force-based layouts): These are the most promising—but they still have some significant limitations for our use cases. The currently released version, fdg-sim is missing some flexibility (e.g. does not allow adding custom forces). There is an unreleased version, fdg, that fixes some of these shortcomings. However, dynamically adding and removing edges will force us to modify some of that libraries internal state. It is also missing a collision force as well as positioning forces (which we need for properly visualizing disjoint graphs). Another risk is, that there are not indications that the unreleased version will be released in the next time, as development has slowed down.

Unfortunately, I think the features provided by the existing crates don't really suffice for our needs.

Ideally we would want to have a Rust version of: https://d3js.org/d3-force. Porting it over should be pretty straight forward, as it is not too much code. To efficiently find collisions, we would also need a quadtree. As per @Wumpf we don't yet have an implementation in Rerun, and I don't know how good the existing implementations on crates.io are—do you have experience with quadtree crates in Rust?

Summary

Overall, I don't think it makes sense to merge the graph primitives without having at least one basic (but robust) layout algorithm implemented to visualize them—we just need to consider if it is worth adding this additional complexity to Rerun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants