Updated copy-tracing proposal to handle conflicts #4988

martinvonz · 2024-11-27T21:08:02Z

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (cli/src/config-schema.json)
I have added tests to cover my changes

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1209241374115289

martinvonz · 2025-01-13T16:06:43Z

I'm getting tired of working on this doc, but I think it's good enough that it's reasonable clear how it will work. So please take a look if you're interested. If you spot problems, it's good to hear about them before I spend time polishing it.

docs/design/copy-tracking.md

torquestomp · 2025-01-14T17:19:47Z

docs/design/copy-tracking.md

+thanks to the same-change rule, we won't consider it a conflict, so maybe it
+actually works well in practice thanks.
+
+For example:


These examples appear incomplete - what is their relation, do they even have one?

ilyagr

I've read about half of the doc so far. It is very interesting; I think it's assembling into a sensible theory in my head (though my understanding is still slowly evolving).

There are a few things that I think could be clearer; I think I did manage to answer most of my own questions though. I still point them out; feel free to tell me if you'd like me to work on clarifying them.It might be a good exercise. I might try to get to the end of the doc first, though.

docs/design/copy-tracking.md

ilyagr · 2025-01-21T06:45:31Z

docs/design/copy-tracking.md

+had names `bar` and `foo`.
+
+To support merging two files into one, the list of past names is actually a DAG.
+Merging can happen in a merge commit when two sides copy/rename different


Presentation: the simplest example of this in my mind is the backout of a commit that copied a file, we could mention that as well.

I agree that it's simpler to produce a merged in the copy graph that way I think it's less intuitive, so I prefer to leave it as is.

ilyagr · 2025-01-22T22:41:38Z

docs/design/copy-tracking.md

+
+```
+$ jj log
+D delete baz


Update: This was a draft comment that got into the final review. I haven't answered this for myself yet, but I probably can.

What is the merge before editing? Conflict between bar and baz? Or both?

What if copies?

Yes, I think we should ideally make it a conflict and let the user decide what to do. This would presumably mean leaving the three trees in the commit even though they could be resolved if copy info is not considered.

docs/design/copy-tracking.md

ilyagr · 2025-01-23T02:52:13Z

docs/design/copy-tracking.md

+The `2:bar->1:foo` syntax above means that copy ID 2 has file `bar`, which was
+copied from copy ID `1`, where it was called `foo`.
+
+Diffing from `L` to `M` finds 2,3,4,5 as changed copy IDs. By walking their


Is this meant to include 1?

Also, it isn't clear to me how it is determined which copy IDs are "changed". Are you diffing the pairs (file_contents, copy_id) for each path? I don't immediately see whether this makes sense, but probably.

Update: I think the picture I explain two comments down might explain this.

ilyagr · 2025-01-23T02:53:53Z

docs/design/copy-tracking.md

+copied from copy ID `1`, where it was called `foo`.
+
+Diffing from `L` to `M` finds 2,3,4,5 as changed copy IDs. By walking their
+graphs, we find that 1,2, and 5 are related, while 3 and 4 are not.


Presentation: it might be nice to explain what happens in simpler cases: e.g. the diff between M and K, as well as L and K. I think the latter example might be helpful for illustrating my question above of how "changed copy ids" are determined.

ilyagr · 2025-01-23T02:58:32Z

docs/design/copy-tracking.md

+Diffing from `L` to `M` finds 2,3,4,5 as changed copy IDs. By walking their
+graphs, we find that 1,2, and 5 are related, while 3 and 4 are not.
+
+Because the `bar` and `baz` files have unrelated copy graphs, we break up their


Add: "in the commit M" I suppose it's true in "L" as well, but it's important which ones you are looking at.

More vaguely, I feel like this and the next paragraph need a clearer explanation if possible. One way to think of it is to draw the copy ID DAGs and point out which commits each node is in. The, we declare a node an add/delete if it's not connected to any node from the other commit. For each pair of (node from L) and (node from M) that are connected, we create a diff. Then explain how the diff is created by following the unique shortest path betweem the nodes.

I can try to write it, if it helps.

graph LR subgraph L["Commit L"] 2["2:bar"] 3["3:baz"] subgraph K["Commit K"] 1["1:foo"] end end subgraph M["Commit M"] 4["4:bar"] 5["5:baz"] end 2 --> 1 5 --> 1

Loading

As you say in the text, here 4:bar is new in M, 3:baz is deleted in L, and we have diffs between (5:baz from M and 1:foo from L), as well as (5:baz from M and 2:bar from L).

A good way to think of it is what I said above: for each pair of (node from L) and (node from M) that are connected via the DAG (or represent the same name, i.e. the same node in the DAG, which is a special case), we create a diff.

The type of diff (whether it's a rename, merge, or a split) seem to depend on which way the arrows in the copy ID DAG point, I'm guessing.

hese might get pretty complicated to describe, e.g. if you get something like the following path between A (a path in the commit that's the source of the diff) and D (a path in the target of the diff) in the DAG. I'm not sure what to do in these cases¹.

A <---- B ----> C <----- D

Another weird case: what if we care about the diff between A and D in

B ----> C <---- D A <---- B D B <---- E ----> D

I'm guessing (but not sure, this is very much worth double-checking) that these are possible graphs to get; they are certainly a DAG.

Here, B and D are single nodes, each with one incoming and one outgoing node. This is still a DAG.

Here's the second graph, redrawn in mermaid:

graph LR A["A, diff source"] D["D, diff target"] B --> A B --> C D --> C E --> B E --> D

Loading

Footnotes

there's probably some solution, at worst we could not cover these cases and consider A and D to be unrelated. ↩

ilyagr · 2025-01-23T03:09:06Z

docs/design/copy-tracking.md

+
+The `2:bar->1:foo` syntax above means that copy ID 2 has file `bar`, which was
+copied from copy ID `1`, where it was called `foo`.
+


Presentation: Perhaps I'd add something like:

Let's consider the diff from L to M. This is the same as the diff of the commit M2 we'd get by running jj new L; jj bookmark create M2; jj restore --from M --to M2 (which would result in the commit M2 having the same tree as M).

docs/design/copy-tracking.md

ilyagr · 2025-01-26T06:18:12Z

docs/design/copy-tracking.md

+
+Commit M:
+name: bar, id: M, copy_id: 4:bar
+name: baz, id: M, copy_id: 5:baz->1:foo


These arrows could be considered "backwards" by analogy to commits. baz has a pointer to foo, but this represents foo becoming baz. In the descriptions in the beginning of the section, the arrows go the other way.

My guess is that it's best to point the arrows the other way from the beginning (so, this arrow should be foo->baz, and the text should say that baz has a pointer to foo), but it's of course arguable.

I'm writing this because when I glanced at the mermaid diagram again, it took me a while to realize that I was thinking of arrows meaning the wrong thing.

PhilipMetzger

I just have some minor thougths and some typo fixes, I also don't consider supporting Git and its backend worth it.

docs/design/copy-tracking.md

PhilipMetzger · 2025-02-14T22:55:27Z

docs/design/copy-tracking.md

+never find any copies. Do we need an indexing pass to detect all renames in a
+repo when running `jj git init`? That can be very expensive for large repos.
+For reference, `git log --summary --find-copies-harder` takes about 165 seconds
+in the git.git repo on my computer.


This could have more examples, like Nixpkgs, mozilla-central and chromium and the results from the implementation you talked about in the Discord.

docs/design/copy-tracking.md

PhilipMetzger · 2025-02-14T23:09:43Z

docs/design/copy-tracking.md

+### Annotate
+


docs/design/copy-tracking.md

PhilipMetzger · 2025-02-14T23:32:46Z

docs/design/copy-tracking.md

+Do we ever want to record renames in the Git backend? If we do, we would
+presumably store it outside the Git object, similar to how we store the change
+id for commits.


Some thoughts, it'd be nice to support precise renames and conflicts in a best effort basis. But since the bar to entry here already is quite steep, which you point out nicely in the next paragraph, I'd rather go for a set of special commands to deal with it in the Git backend, when we communicate with a native backend.

I think if we're going to index copy information for Git repos anyway, then it doesn't seem like much extra work to make it possible for a user to modify that state themselves.

I think if we're going to index copy information for Git repos anyway, then it doesn't seem like much extra work to make it possible for a user to modify that state themselves.

Yes, that's a good point. I think I previously over-indexed on the best effort use-case.

docs/design/copy-tracking.md

This version describes a design that I think will work relatively well for supporting copy tracing even when there are conflicts. There are still some open questions, in particular around how to populate the data in Git repos, and how to efficiently query the data in very large repos like the one at Google. I think it's still clear enough that we can start working on the implementation.

martinvonz

I've addressed many of the comments. There are still many left to address.

FYI, @steadmon will be working on implementing the feature.

docs/design/copy-tracking.md

martinvonz · 2025-02-19T20:56:22Z

docs/design/copy-tracking.md

+
+```
+$ jj log
+D delete baz


Yes, I think we should ideally make it a conflict and let the user decide what to do. This would presumably mean leaving the three trees in the commit even though they could be resolved if copy info is not considered.

martinvonz · 2025-02-20T00:54:35Z

docs/design/copy-tracking.md

+had names `bar` and `foo`.
+
+To support merging two files into one, the list of past names is actually a DAG.
+Merging can happen in a merge commit when two sides copy/rename different


I agree that it's simpler to produce a merged in the copy graph that way I think it's less intuitive, so I prefer to leave it as is.

martinvonz · 2025-02-20T01:29:25Z

docs/design/copy-tracking.md

+### Annotate
+


docs/design/copy-tracking.md

martinvonz · 2025-02-21T22:12:21Z

docs/design/copy-tracking.md

+Do we ever want to record renames in the Git backend? If we do, we would
+presumably store it outside the Git object, similar to how we store the change
+id for commits.


I think if we're going to index copy information for Git repos anyway, then it doesn't seem like much extra work to make it possible for a user to modify that state themselves.

docs/design/copy-tracking.md

steadmon · 2025-02-26T23:57:10Z

docs/design/copy-tracking.md

+The `2:bar->1:foo` syntax above means that copy ID 2 has file `bar`, which was
+copied from copy ID `1`, where it was called `foo`.
+
+Diffing from `L` to `M` finds 2,3,4,5 as changed copy IDs. By walking their


I think I got tripped up over something silly here. You say "Diffing from L to M", which makes me think L is the source, and M is the destination. However, later you say "foo does not exist on the source side", which implies the opposite.

`jj resolve` will currently try to resolve the executable bit and will clear it if it can't be resolved. I don't think the executable bit should be affected by the external merge tool. We could preset the flags on the files we pass to the merge tool and then expect the merge tool to update the flags, but we don't do that. I'm not sure that's a good idea (few merge tools probably expect that), but since we don't do it, I think it's better to leave the executable bits unchanged. This patch makes it so by calling `Merge::with_new_file_ids()` on the original conflict whether or not the content-level conflict was resolved. I noticed this was while working on the copy-tracking proposal in PR #4988, which involves adding copy information in the `TreeValue`. If we do implement that, then we should preserve copy information here too (in addition to the executable flag). That will automatically work after this patch.

martinvonz force-pushed the push-msqlmmqooqmp branch from 9fcc8e4 to 0bae747 Compare November 27, 2024 21:11

martinvonz force-pushed the push-msqlmmqooqmp branch 2 times, most recently from e178ecb to e1d524b Compare December 10, 2024 07:06

martinvonz force-pushed the push-msqlmmqooqmp branch from e1d524b to 30c657d Compare December 17, 2024 18:41

martinvonz force-pushed the push-msqlmmqooqmp branch 2 times, most recently from bebb15f to 3ef54f1 Compare January 13, 2025 16:04

martinvonz marked this pull request as ready for review January 13, 2025 16:04

yuja reviewed Jan 14, 2025

View reviewed changes

docs/design/copy-tracking.md Outdated Show resolved Hide resolved

torquestomp reviewed Jan 14, 2025

View reviewed changes

ilyagr reviewed Jan 23, 2025

View reviewed changes

martinvonz force-pushed the push-msqlmmqooqmp branch from 3ef54f1 to b6a7d8d Compare January 24, 2025 19:07

ilyagr reviewed Jan 26, 2025

View reviewed changes

PhilipMetzger reviewed Feb 14, 2025

View reviewed changes

martinvonz force-pushed the push-msqlmmqooqmp branch 2 times, most recently from 194943d to 864cf6e Compare February 21, 2025 22:13

martinvonz force-pushed the push-msqlmmqooqmp branch from 864cf6e to afe30b3 Compare February 21, 2025 22:14

martinvonz commented Feb 21, 2025

View reviewed changes

steadmon reviewed Feb 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated copy-tracing proposal to handle conflicts #4988

Updated copy-tracing proposal to handle conflicts #4988

martinvonz commented Nov 27, 2024 •

edited by arxanas

Loading

martinvonz commented Jan 13, 2025

torquestomp Jan 14, 2025

ilyagr left a comment

ilyagr Jan 21, 2025

martinvonz Feb 20, 2025

ilyagr Jan 22, 2025 •

edited

Loading

martinvonz Feb 19, 2025

ilyagr Jan 23, 2025

ilyagr Jan 23, 2025

ilyagr Jan 23, 2025

ilyagr Jan 23, 2025 •

edited

Loading

ilyagr Jan 23, 2025 •

edited

Loading

ilyagr Jan 23, 2025

ilyagr Jan 26, 2025 •

edited

Loading

PhilipMetzger left a comment

PhilipMetzger Feb 14, 2025

PhilipMetzger Feb 14, 2025

martinvonz Feb 20, 2025

PhilipMetzger Feb 14, 2025

martinvonz Feb 21, 2025

PhilipMetzger Feb 21, 2025

martinvonz left a comment

martinvonz Feb 19, 2025

martinvonz Feb 20, 2025

martinvonz Feb 20, 2025

martinvonz Feb 21, 2025

steadmon Feb 26, 2025


		The `2:bar->1:foo` syntax above means that copy ID 2 has file `bar`, which was
		copied from copy ID `1`, where it was called `foo`.

		### Annotate

		### Annotate

Updated copy-tracing proposal to handle conflicts #4988

Are you sure you want to change the base?

Updated copy-tracing proposal to handle conflicts #4988

Conversation

martinvonz commented Nov 27, 2024 • edited by arxanas Loading

Checklist

martinvonz commented Jan 13, 2025

Choose a reason for hiding this comment

ilyagr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilyagr Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilyagr Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

ilyagr Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

ilyagr Jan 26, 2025 • edited Loading

Choose a reason for hiding this comment

PhilipMetzger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinvonz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinvonz commented Nov 27, 2024 •

edited by arxanas

Loading

ilyagr Jan 22, 2025 •

edited

Loading

ilyagr Jan 23, 2025 •

edited

Loading

ilyagr Jan 23, 2025 •

edited

Loading

ilyagr Jan 26, 2025 •

edited

Loading