Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edge cases with forks #29

Open
hackergrrl opened this issue Nov 19, 2016 · 10 comments
Open

edge cases with forks #29

hackergrrl opened this issue Nov 19, 2016 · 10 comments

Comments

@hackergrrl
Copy link
Contributor

Here are some edge cases regarding ways and forks of them and their nodes that we ought to consider & test for, pulled from @gmaclennan in IRC:

If a point is forked and one fork is inside the query and the other outside, does it just return the one inside? It probably should and probably does.

Here's a reason for some ghost points: when there is a forked way, the query lookup I believe will return the points associated with both forks of the way. But the points arenM- forked, so we return them all, but we choose only one of the forks of the way to return.

Also: if a way is forked and one fork is a delete, does the indexer correctly delete the reverse lookup? We should ensure we have a test for that.

@gmaclennan
Copy link
Member

Also: if a way is forked and one fork is a delete, does the indexer correctly delete the reverse lookup? We should ensure we have a test for that.

I think we may be hitting this. I am seeing features appear in Mapeo/iD which do not actually exist in the database when you try to query them. I have a feeling this is because of the following scenario:

  1. Way A is created with points B, C, D
  2. User X edits A by changing an attribute --> A'
  3. User Y deletes A (not A')
  4. X and Y replicate data. I think the lookup code in queryStream may return the deleted way when any of points B, C, D are in the bbox.

Can we write a test for this?

@hackergrrl
Copy link
Contributor Author

I wrote a test for this in osm-p2p-server (already had the scaffolding there), but what I see is the modified way being returned: digidem/osm-p2p-server@265d2ab

@hackergrrl
Copy link
Contributor Author

@gmaclennan I remember you describing another case involving overlapping bounding boxes, or similar. Could you describe that?

@gmaclennan
Copy link
Member

The bugs in Mapeo definitely seem related to queryStream lookups (from the api/0.6/map endpoint). iD makes queries to this end point for tiles at zoom 16 (regardless of the UI zoom of the map). This end point returns nodes outside the bounding box if they are referenced by ways that have nodes inside the bounding box, is accordance with the spec

What I am guess is happening is that this lookup code (essentially, hyperlog-index) has a bug. My guess would be to look at a forked way like the one described above and do a bounding box query that only includes one of its nodes. We should carefully think through what should actually happen in this scenario, both when we are requesting forks and when we are not. What happens when one of the "forks" is a deleted way?

@hackergrrl
Copy link
Contributor Author

After much digging, I believe I have an explanation. Consider a way with the following DAG structure in osm-p2p-db:

83b1221b4be413a30478fbb0d5be06395bd0c55d6a760ef4cf564531b10eccc2
  <- b9d60d6eb459b2e0fc693226c847b51995923c46c8fde6a20c3b94ff8579f929
    <- 6117e6d75ddb56985a53f3e116d3b4c0da8f2a231bd861c97d98071f02146759
      <- 86679afd770983ff67c529948ad5a053819ae1199a996d122e5927ba5d8aab10
        <- 495ee12c58537aa44fef7bb872ccf6cfebe8e23b3b490817b2e0d98bd11fccba
          <- e8c20112a059945fe8a3dec28972979b9b488cf62ab43c49556053e9fec4d450
            <- a7accc4532bb53283c7e582eee9558c418591b4c05eb8892ec3f5c233b2c55bf
              D dded60535063f8202e8ab7d549a78ce3ae787e39940e02846c818e2aba679f50
    <- ce2a6bac3aff1587415add7e0028a7da1d7a35832416bcd25306121fbaed0471

Where 83b1221.. is the root, and the others are children, grandchildren, etc. In this tree, many edits happened to the root way, eventually culminating in a deletion of the way (and all of its nodes). However, elsewhere on another machine, another edit took place from an earlier version of the way (ce2a6b..).

This results in the osm-p2p-db being in the following state:

  1. osm.get(osm_id_of_the_way) => ce2a6bac3aff1587415..
  2. osm.get(a_node_in_that_way) => null

What's happening is, after these forks exist in the same osm-p2p-db, the database considers the non-deleted way as real, but still thinks all of its nodes are deleted.

@gmaclennan
Copy link
Member

Yes, good find. When an element is deleted via the API we check whether it is referred to by any existing elements: https://github.com/digidem/osm-p2p-server/blob/master/lib/filter_deletes.js

Of course, as you point out, it could be referred to by forked elements on other peers.

I am not entirely sure about the best way to deal with this. I think if a way exists that refers to deleted nodes, we should return those nodes in the way.

I think osm.get() should have a way of distinguishing between a node that does not exist and a node that has been deleted. The OSM API uses an attribute visible=false for elements that have been deleted (http://wiki.openstreetmap.org/wiki/Elements#Common_attributes). We should perhaps use a similar convention, e.g. if an element exists but has been deleted osm.get() should return the element but with the property visible: false.

I think this is needed anyway for the filtering of forks to work correctly e.g. two forks of a point exist, and in Mapeo you would see the most recently edited. If you delete that fork, the de-forking code will return the non-deleted fork on the next request, so to the user it looks like the delete action failed.

@hackergrrl
Copy link
Contributor Author

hackergrrl commented Feb 3, 2017 via email

@gmaclennan
Copy link
Member

Yes, the issue of ways and relations referring to nodes/members only by OsmID is a big problem, not just for osm-p2p but for anybody dealing with historic OSM data. I know the Mapbox data team has hit this in their need to review data changes. The workaround they use I think is to use the timestamps to reconstruct which version of nodes/members were referenced by a particular way/relation. This is obviously fragile and costly, especially in a p2p system.

I think we can add versions to the way/relations in a way that remains compatible with existing clients:

  1. Internally we should store both id and version of nodes within a way and members within a relation.
  2. We should prefer version when doing a lookup in the index, but fallback to id.
  3. When a non-p2p-aware client submits a change, use the version of the nodes / members from the changeset if present in the changeset, if not set the version by selecting the most recent fork using the same algorithm that would have been used to present the most recent fork to the client.

Some issues we would hit:

  1. iD editor does not include any unchanged nodes in a changeset if you change the tags on a way.
  2. If you move a node in a way, the way itself is not included in the changeset.

The version number of a way would need to change every time a node was changed. iD might need a patch to ensure it pulls down the updated way after only a node was updated.

For relations, this would make it hard to avoid forks: if we store version numbers on relations, any update to a member would need to update the version of the relation. This would mean we would not be able to use relations for long rivers to avoid forks - any edit to any segment of the river would create a new version of the relation.

@gmaclennan
Copy link
Member

I think treating changesets as commits introduces new problems. This would mean we would have forks at a higher level (changesets) which would be more difficult to resolve.

@gmaclennan
Copy link
Member

See also: #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants