Replies: 5 comments
-
Good question, and thanks for asking! That recursive representation was actually how I stored it initially (it seems like the most intuitive representation). The ambiguous representation was actually something that bothered me and I was happy when I realized it could be replaced by two lists :) I don't think I considered that the different recursive representations might carry more information. Practically, the biggest problem I had with the recursive representation was how to simplify conflicts. As a reminder, simplification is done when merging trees. It makes it so continuously rebasing a commit onto a new upstream doesn't make keep the conflict more complex every time. I found it hard to reason about and code the simplification when representing conflicts as recursive structs. For example, all of these can be simplified:
At least for the last two there, it's non-trivial to see what the correct simplification is. The list representation makes it very simple to do: you simply remove pairs of adds and removes that are the same.
I'm not sure what you mean. I can think of three different things you may be thinking about :)
Did at least one of those bullets answer your question?
I'm not sure I follow. Even if the internal representation is the list-based one, we could still call a 3-way merge tool with just "remove" and two "adds" as usual. Then if the tool or the user resolves those, we can replace those entries in the lists by just one "add" for the resolved content. Does that answer your question or did I misunderstand? By the way, I recently added a new type of conflict markers to Mercurial (https://phab.mercurial-scm.org/D9551). The inspiration to that format came from Jujube's handling of conflicts. I haven't gotten around to using the same format in Jujube, however. Off topic: I've recently finished making it so the "commit index" (like what Git calls the "split commit graph") is kept up to date within a transaction and so it is written incrementally to disk. I'm now converting existing callers to use the index to speed them up. Then I plan to implement revsets using that. Perhaps I'll rewrite the file diffing and file merging as mentioned above after that. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed explanation!
This looks surprising at first read. But I understand it now. It's an elegant design!
Great examples! It took me some time to see how they work.
This explains it (esp. "trying all combinations"). All combinations still seem computationally prohibitive. But I guess the commit graph, past conflicts, or just the content of conflict can be used as heuristics to select promising combinations for 3-way merge.
Yeah, I wasn't thinking about using 3-way merge because it's unclear which "base" to use. I was thinking more about some non-traditional UI that presents more than 3 versions together (ex.
Looks useful. We might just steal it.
Last time (long ago) I checked the git's commit graph, my understanding about its key ideas are 1) introduce integer identities (file offset is a kind of identity) for O(1) lookup and 2) use generation numbers for early exit in linear algorithms. I wasn't so impressed because hg revlog has achieved both already. Are you trying to be compatible with upstream git or building something different? How would you compare it with revlog? For our commit graph, initially I thought about using generation numbers, or using commit hashes. Later I decided to use unique integer identities, similar to the hg's revision numbers. The main benefit is that commit hashes (hardly compressible) can now be lazy. We were jokingly talking about cloning an internal repo into a floppy disk size-wise. It is also friendly for large set representation (ex. spanset is O(1) space), and for |
Beta Was this translation helpful? Give feedback.
-
Yeah, even for an octopus merge with 10 parents, it would get pretty expensive. I'll probably first try that idea of pushing the conflicts down to the chunk level as I said. I suspect that will be good enough for almost all practical cases. It feels like all practical cases I can think of will either have non-overlapping conflicts or will result in conflicts regardless of which order the versions are merged.
Please do :) I've found it quite useful, but it takes a little while to get used to.
Yes, that's correct. Mercurial already achieves most of it with its revlog (generation numbers only provide a small benefit).
I have two backends (git and my own naive backend, which is mostly to show that it works) and I'm trying to depend as little as possible on the backend, so I've chosen my own format (I don't think it could be compatible with git and still support my other backend). Like git's commit graph, it's purely a cache/index (can be recalculated).
I remember that from your presentation at the SCM summit. I really like the "segmented revlog" idea. Unfortunately, I had already implemented my commit graph at that point (but it's taken until now to keep it up to date in memory within transactions). I'll definitely keep the idea in mind. I'd be happy to copy it, but there are so many more important pieces missing in Jujube still. FYI, I've tried hard to make Jujube safe to use on a distributed file system (and also to make it lock-free for concurrent local processes), so I don't use any mutable files. My impression is that your segmented revlog idea would work well with that constraint, even if you don't do it that way (or maybe you do, I haven't checked). |
Beta Was this translation helpful? Give feedback.
-
Currently it uses The |
Beta Was this translation helpful? Give feedback.
-
Closing this since the current representation can handle stacked conflicts. If there are anything to change, I think it might be worthwhile adding commit (and path) metadata so the commit graph can be used to pick merge bases (ex. for added A prefer A's p1 if A's p1 is in removes), but that's just nice-to-have features. I personally like the current design as-is, than the chunk-level conflicts. The reason is chunks (and diff algorithm) are ambiguous. In my view, the diff and chunking belong to more of the "merge tool"'s job. With that mindset, the default internal merge algorithm can be opinionated on diffing (therefore chunking), and try to merge non-conflicted regions as much as possible. The storage remains the snapshot form so the opinionated diff algorithm does not enforce its opinion on the storage layer. |
Beta Was this translation helpful? Give feedback.
-
You probably have thought about this. I was trying to compare the current conflict representation (two lists,
(adds, removes)
) with a conservative recursive struct (a tree of(local, other, base)
).To simplify the problem, only file modifications are considered, no deletions.
Consider these recursive structs:
They map to a same flat list representation (IIUC):
Apparently some information (ex. relations between merge base and local/other) is lost in this representation but does it matter? To reason about it, here is a question I tried to answer:
(adds: [A, B, D], removes: [C, E])
and produces a similar conflict-free merge result?I couldn't come up with such an algorithm easily. I wonder if that indicates a recursive representation is more practical. It seems there are other benefits, too, such as easier to integrate with existing merge tools and easier to store resolve progress.
Beta Was this translation helpful? Give feedback.
All reactions