-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
edge cases with forks #29
Comments
I think we may be hitting this. I am seeing features appear in Mapeo/iD which do not actually exist in the database when you try to query them. I have a feeling this is because of the following scenario:
Can we write a test for this? |
I wrote a test for this in osm-p2p-server (already had the scaffolding there), but what I see is the modified way being returned: digidem/osm-p2p-server@265d2ab |
@gmaclennan I remember you describing another case involving overlapping bounding boxes, or similar. Could you describe that? |
The bugs in Mapeo definitely seem related to queryStream lookups (from the What I am guess is happening is that this lookup code (essentially, hyperlog-index) has a bug. My guess would be to look at a forked way like the one described above and do a bounding box query that only includes one of its nodes. We should carefully think through what should actually happen in this scenario, both when we are requesting forks and when we are not. What happens when one of the "forks" is a deleted way? |
After much digging, I believe I have an explanation. Consider a way with the following DAG structure in osm-p2p-db:
Where This results in the osm-p2p-db being in the following state:
What's happening is, after these forks exist in the same osm-p2p-db, the database considers the non-deleted way as real, but still thinks all of its nodes are deleted. |
Yes, good find. When an element is deleted via the API we check whether it is referred to by any existing elements: https://github.com/digidem/osm-p2p-server/blob/master/lib/filter_deletes.js Of course, as you point out, it could be referred to by forked elements on other peers. I am not entirely sure about the best way to deal with this. I think if a way exists that refers to deleted nodes, we should return those nodes in the way. I think I think this is needed anyway for the filtering of forks to work correctly e.g. two forks of a point exist, and in Mapeo you would see the most recently edited. If you delete that fork, the de-forking code will return the non-deleted fork on the next request, so to the user it looks like the delete action failed. |
I think the core problem is that ways refer to its nodes by their OsmID
rather than their OsmVersion.
OsmID is a vague notion of "the latest version", which means if we don't
select the latest version of *every* element we return (in the
non-forking case), we'll always get edge cases with inconsistencies,
even with the approaches you've mentioned.
If we had ways and relations refer to *specific versions* of OSM
documents, we would have greater consistency. We could then choose ANY
version of any way, and see what nodes it has, regardless of whether
they were deleted in some other reality.
I liken our current model to git, except we have commits that refer to
filenames instead of specific file version hashes: you couldn't possibly
expect reasonable results.
Following that analogy, maybe that's how we ought to model our data:
where each changeset is essentially an atomic commit, and you can only
choose which specific changeset you want to treat as your HEAD. This
means we inherent all of the consistency and soundness of git's
model[1].
[1] But perhaps also the merge conflict pains as well.
|
Yes, the issue of ways and relations referring to nodes/members only by OsmID is a big problem, not just for osm-p2p but for anybody dealing with historic OSM data. I know the Mapbox data team has hit this in their need to review data changes. The workaround they use I think is to use the timestamps to reconstruct which version of nodes/members were referenced by a particular way/relation. This is obviously fragile and costly, especially in a p2p system. I think we can add versions to the way/relations in a way that remains compatible with existing clients:
Some issues we would hit:
The version number of a way would need to change every time a node was changed. iD might need a patch to ensure it pulls down the updated way after only a node was updated. For relations, this would make it hard to avoid forks: if we store version numbers on relations, any update to a member would need to update the version of the relation. This would mean we would not be able to use relations for long rivers to avoid forks - any edit to any segment of the river would create a new version of the relation. |
I think treating changesets as commits introduces new problems. This would mean we would have forks at a higher level (changesets) which would be more difficult to resolve. |
See also: #21 |
Here are some edge cases regarding ways and forks of them and their nodes that we ought to consider & test for, pulled from @gmaclennan in IRC:
The text was updated successfully, but these errors were encountered: