Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregation of PATH/PRIMITIVE objects #33

Open
mpadge opened this issue Oct 8, 2017 · 7 comments
Open

aggregation of PATH/PRIMITIVE objects #33

mpadge opened this issue Oct 8, 2017 · 7 comments

Comments

@mpadge
Copy link
Member

mpadge commented Oct 8, 2017

Might be jumping ahead of things, but dodgr has made me realise there'll be a world of applications which involve analysing relatinoships between objects. Classic example is graph/network relationships and associated metrics. In these cases, computational efficiency scales directly with numbers of objects. Numbers of PATH/PRIMITIVE (hereon, "PP") objects will often be very large, but many of these may be effectively redundant. Vertices lying on a single line, for example, can all be represented by start and end points alone (along with line length, or whatever metric), and so aggregated into a single object. Relationships between lines can then be analysed much more efficiently.

As always, an OSM example: Lines tend on average to be composed of around 10 nodes (vertices), so aggregating PP objects into higher-level line objects results in network-bsaed analyses able to be run around 10 times faster. That kind of speed up is hugely important. The dodgr issue that dicussed some of this stuff reveals the trade-offs between analytical exactness of the PP approach versus computational efficiency of more conventional sp/sf approach. The latter is inexact yet efficient; the former the oppposite. Making PP analysis more efficient simply requires identifying the appropriate level of aggregation, and constructing higher level objects.

The good news: I've done all the work here for point-to-line aggreagation, via pretty highly optimised C++ code, in dodgr. The not-so-good-news: There's be an unlimited number of other use cases, and other means and forms of aggregation that have yet to be identified, and that we'd have to identify. Aggregation of triangles is an obviuosly necssary extension, but maybe there'd be a neat CGAL solution there? Anywa, I could at least start by porting the dodgr approach to aggregation of PP objects into higher-level lines. These also very neatly correspond to sf::LINESTRING objects, so would be a neat way to convert silicate into sf in a way that is much more compact than current direct conversion.

Lots of words, maybe not much can be done at the mo with this issue, but keen to hear your thoughts.

@mpadge
Copy link
Member Author

mpadge commented Oct 8, 2017

And that is also of course an implicit call for an appropriate way to represent hierarchical inter-relationships between objects, both between PP and higher-level objects, and between different kinds and levels of higher-level objects. This is part of what I've been toying with with sphier. I'll try to get something there this coming week, but that aims to allow all to work within this simple binary distinction of PP and "higher-level" objects, with a potentially infintive recursion of inter-relationships between endless hierarchical levels able to represented in 3 simple flat tables (the main silicate three).

@mdsumner
Copy link
Member

mdsumner commented Oct 11, 2017

I'm very keen to see what you've come up with here! I add this simply as "it's somehow related", and I want to record the thought. Inspired by

https://twitter.com/mdsumner/status/918032157876011008

I believe this is related, the ability to detect when a mesh does not "intersect nicely", or "non-conforming" because the edges taken by decomposing paths cannot be normalized.

Here's an approach using silicate to generate sf-line-segments, when are then easily tested for intersection in the rest of the set - by definition, a non-conforming edge will st_cover more than itself.

It's probably simpler than this still though, since ultimately this is a point-in-line test, so your aggregation tools probably cover this I expect.

## literal segment instances
sc_segment <- function(x, ...) {
  #path <- PATH(x)[["path_link_vertex"]]
  path <- x[["path_link_vertex"]]
  dplyr::bind_rows(lapply(split(path[["vertex_"]], path[["path"]])[unique(path[["path"]])], 
                          function(x) path_to_segment(x)), .id = "path")
}
## normalized segments, in a model
EDGE <- function(x, ...) {
  out <- PATH(x)
  seg <- sc_segment(out)
 uu <- paste(pmin(seg[[".vertex0"]], seg[[".vertex1"]]), 
                       pmax(seg[[".vertex0"]], seg[[".vertex1"]]), sep = "_")
  seg[["edge"]] <- sc_uid(length(unique(uu)))[factor(uu)]
  out[["object_link_edge"]] <-dplyr::transmute(seg, path, edge = seg[["edge"]]) %>% 
    inner_join(out$path) %>% select(object, edge)
  out[["edge_link_vertex"]] <- seg %>% 
    select(-path) %>%  
    distinct(edge, .keep_all = TRUE) #%>% 
 segment <- tibble::tibble(segment = match(seg$edge, out[["edge_link_vertex"]][["edge"]]))
 out[["path_link_vertex"]] <- NULL
 out[["path"]] <- NULL
  out
}

## once we have unique edges

e <- EDGE(minimal_mesh)
sf_edges <- st_sfc(tidyr::gather(e$edge_link_vertex, npoint, vertex_, -edge) %>% 
  arrange(edge, npoint) %>% 
  inner_join(e$vertex) %>% split(.$edge) %>% 
  purrr::map(function(x) st_linestring(cbind(x$x_, x$y_))) )

l <- st_covers(sf_edges, sf_edges)
## there are none, because minimal_mesh is "nice"
which(purrr::map_int(l, length) > 1)

library(sp)
## the c polygon edge covers an a and b edge, not nice!
p <- spPolygons(matrix(c(0, 0, 3, 3, 0, 
                         0, 1, 1, 0, 0), ncol = 2), 
                matrix(c(0, 0, 3, 3, 0, 
                         1, 3, 3, 1, 1), ncol = 2), 
                matrix(c(3, 3, 5, 5, 3, 
                         0, 3, 3, 0, 0), ncol = 2), 
                attr = data.frame(x = c("a", "b", "c")))

p <- st_as_sf(p)
e <- EDGE(p)
sf_edges <- st_sfc(tidyr::gather(e$edge_link_vertex, npoint, vertex_, -edge) %>% 
                     arrange(edge, npoint) %>% 
                     inner_join(e$vertex) %>% split(.$edge) %>% 
                     purrr::map(function(x) st_linestring(cbind(x$x_, x$y_))) )

l <- st_covers(sf_edges, sf_edges)
result <- which(purrr::map_int(l, length) > 1)
## because
par(mfrow = c(2, 1))
plot(st_geometry(p))
plot(sf_edges[l[[result]][1]], add = TRUE, col = "dodgerblue", lty = 2, lwd = 4)
plot(st_geometry(p))
plot(sf_edges[l[[result]][2]], add = TRUE, col = "firebrick", lty = 2, lwd = 4)

Ultimately you would need more nuance than "cover" since edges might partially cover - but I think generally we are interested more in maintaining nice mesh-ness than creating it.

image

@mpadge
Copy link
Member Author

mpadge commented Oct 11, 2017

Hmm, that's a really interesting case. Woudl the silicate:PRIMITIVE representation then contain 10 edges rather than 11? No, it can't because maybe only polygon c is first declared and stored, only after which and and b are added. Reduction to only 10 primitive edges would entail modifying c just because a and b happened to be subsequently added, and that can't be a sensible approach. So c would have to remain "as is", with the ability to decompose implemented as some kind of post-hoc operation.

But that's cool; so far so good, utnil ... my notions of aggregation would then suggest an obverse problem. Say a and b are added, post-hoc decomposition is implemented, and so the edge on c is decomposed into its two "conformable" parts ("comformable" meaning here that it conforms with what we know about the rest of the system at the time it's decomposed). Then we want to aggregate. What do we do? If a and b are still there, then the two edges in c need to remain as such so we can unambiguously specify how c and a/b relate. Only if both are removed, can the two edges once again be aggregated into one. This seems to suggest that aggregation can also only sensible be implemented as a post-hoc operation. Oh shit, but that's not how I envisioned it ... time for further pondering ...

@mdsumner
Copy link
Member

mdsumner commented Oct 11, 2017

Well, there's still clear need for re-meshing only when asked for - if I wanted to triangulate those three polygons and drape them in 3D then I would want the extra node, and for three edges to become two, because otherwise there's a singularity-wall (or a gap) between the two surfaces if the edges traverse different profiles. If it's a network model then it's one way to have flows disallowed between otherwise neighbouring boundaries. So I see this as a positive really, it's something that's either disallowed or ambiguous in SF, whereas here it's clear and has a clear meaning. (I've found CGAL not robust to some segments that Triangle has no problems with, and this actually might be the situation that causes it ...)

I've been thinking about ways to tabulate the number of shared edges between shapes, in a whole-set way rather than laboriously as above in my example - it's really just a cross-check, it could be quite powerful and provide a way to hone in on topology edits.

@mpadge
Copy link
Member Author

mpadge commented Oct 11, 2017

I do all the (1D) edge aggregation in C++ and doubt there'd be a sufficiently efficient R solution. I think algorithically this ought not be too difficult. I just make loads of unordered_map objects between vertices and edges and vice-versa, and work with those. R can't approach the efficiency of constantt time lookup of a C++ std::unordered_map, and the whole thing only gets read in and out once, so it ends up something like O(2N+e), where e is some small constant overhead. You can't get more efficient than that. In short: "laborious"" is the only way. We really should start compiling a list of cases somewhere - as an issue in your silicate book repo?

I think the 1D case is sufficiently clear that I could have a go at a generic aggregator for that. What dodgr does is decompose into silicate-type structures, as i've told you, and the aggregator will then give a maximally compact sf-representation for compatibility with other sf stuff. But those re-aggregated versions will never be the same as the original ones, because they represent the minimally (or is that maximally?) disambiguous structures - no overlap, no crossing, no shared anything other than terminal vertices. And that's what I mean that 2D would in principle be analogous and not too much more difficult. Decompose based on a "proper" triangulation (cue your example again), then use that to aggregate back to minimally disambiguous state. In your above example, the line in polygon c would then retain its two components. Bam, job done. Yeah, that should work, actually, now that you've done the gary for triangulation.

Okay, with that little clarity boost, I could probably get cracking on passing an sf::MULTILINESTRING object through sphier to spit out both the underlying silicate/sphier representation, as well as a re-aggregated (and generally non-identical) MULTILINESTRING. (I'm referring to sphier only until we figure out my hierarchical concerns, but expect everything to move into silicate when that's done. sphier is just an ideas pad for me for the moment.)

@mdsumner
Copy link
Member

This will definitely occur from a SC0 or SC representation, see here for a simple example of what can be done when arcs/ways are unique:

#64

@mpadge
Copy link
Member Author

mpadge commented Aug 17, 2019

@mdsumner More extensions to this issue as a result of the very constructive and positive reception to my recent silicate presentation with the Monash Numbats group. One recurrent theme throughout the questions was aggregation. I focussed strongly on the silicate ability to "disaggregate and reaggregrate" to arbitrary hierarchical levels. A "standard" representation is then the completely disaggregated level in that it includes all data on every vertex, to which it was suggested that it would be useful to be able to "aggregate" to a sufficient or desired level of "smoothing". A standard representation may often contain unnecessarily many data, and these could be reduced through some kind of generic smoothing/aggregation function. silicate may not be the place for such a function, but it's a very useful thought. An initial simple implementations could be something like edge smoothing (which I would of course take care of), and triangulation aggregation (which you would take care of). A couple of folk suggested this might immediately enhance general usage of silicate and silicate representations of data, which seems very sensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants