Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get diff between specified commits #87

Open
ayaankazerouni opened this issue Oct 8, 2017 · 27 comments
Open

Get diff between specified commits #87

ayaankazerouni opened this issue Oct 8, 2017 · 27 comments

Comments

@ayaankazerouni
Copy link
Contributor

This is a feature request. I'd like a way to get the diff between two specified commits.

@davisjam
Copy link
Contributor

davisjam commented Oct 8, 2017

It looks like this is something both git and SVN support. I think an API for this can be added to the SCM interface without too much difficulty.

@ayaankazerouni Want to try a PR for the git path?

@ayaankazerouni
Copy link
Contributor Author

Yeah, I don't mind putting some time into it.

@ayaankazerouni
Copy link
Contributor Author

@mauricioaniche any thoughts or pointers?

@mauricioaniche
Copy link
Owner

How do you see one using this feature? You are visiting commits, and you need the diff between X and X - 3, let's say? And you need to do this over time, and that's why you need this feature in repodriller?

I agree that SCMRepository interface is the best place to contain such method. (In the future, we can think about breaking this interface into smaller ones, as it's growing up!)

@ayaankazerouni
Copy link
Contributor Author

ayaankazerouni commented Oct 10, 2017

My imagined use case is you've identified two commits of interest with a variable amount of time between them, and you need the diff. I don't really have to do anything 'over time' except identify the two commits of interest.

One could argue that repodriller need not support this; I thought it might provide a convenient wrapper around JGit's diffing functionality (which is not difficult, but kind of tedious to implement).

@davisjam
Copy link
Contributor

So @ayaankazerouni if I understand correctly:

During your RepoDriller scan you identify a bunch of pairs of commits of interest <X_i, Y_i>.
Or perhaps you have a starting commit X and identify a bunch of comparison commits Y_1 Y_2 etc.

While visiting each commit Y_i, you'd like your analysis to include "How different are X and this Y_i"?

This seems like a reasonable question to want to ask.

I don't think you can ask it smoothly right now as part of the RepoDriller interface. At the moment it looks like you would have to print all the <X_i, Y_i> pairs during your CommitVisitor and then do post-processing outside of RepoDriller, or do a bunch of JGit etc. hacking in your CommitVisitor.

@davisjam
Copy link
Contributor

@mauricioaniche Can you clarify when to put functionality into the SCM interface vs. the SCMRepository class?

@mauricioaniche
Copy link
Owner

Sounds good to me, then!

@davisjam I meant SCM interface, indeed!

@ayaankazerouni
Copy link
Contributor Author

@mauricioaniche -- I have some questions about the format in which we want the diff information returned.

Do we want to return a list of DiffEntries (from JGgit) or a list of Strings that would be parseable using RepoDriller's own DiffParser? Alternatively, we could build and return a list of Modifications like the type available in a Commit. Then we can interact with the diff in a familiar fashion. There is some code to do this in GitRepository that could be reused. Thoughts?

@davisjam
Copy link
Contributor

There is some relevant discussion in #62.

@ayaankazerouni
Copy link
Contributor Author

There is some relevant discussion in #62.

Sounds like something I can add as part of this pull request.

@davisjam
Copy link
Contributor

Agreed, seems like a good fit. Some clarity on the relationship between Diffs, Modifications, Commits, etc. in the Javadoc would also be helpful.

@mauricioaniche
Copy link
Owner

I'd say we should avoid exposing JGit details out of GitRepository; otherwise, clients would have to change when using SVN.

At this moment, I'd say either returning a simple String or repodriller's diff classes would be good. Then, as soon as we get more people using this feature, we'll make it better for them!

@ayaankazerouni
Copy link
Contributor Author

returning a simple String or repodriller's diff classes would be good

Ok, I'm returning a list of Modifications

@mauricioaniche
Copy link
Owner

List of modifications? Not sure if I understood. I thought the goal was to return the diff between version A and B!

@ayaankazerouni
Copy link
Contributor Author

ayaankazerouni commented Oct 13, 2017

While visiting commits, we use commit.getModifications() to get the changes made in the specified commit, correct? In GitRepository, that list of Modifications is built by diffing the current commit and its parent commit. I'm reusing that code, except instead of using a commit and its immediate parent, I use a commit and some specified prior commit, passed in as an additional argument.

EDIT: See the latest in #90 to see what I mean.

@davisjam
Copy link
Contributor

The goal is to return the diff between commit A and commit B, yes. This can be done either by:

  1. Returning the sequence of commits between A and B. But it's not clear what this means if A and B are e.g. on different branches.
  2. Returning the diff in terms of project state, expressed in the form of a List (Collection?) of Modifications.

I like 2 better. 1 seems too fine-grained and hard to define. 2 seems straightforward to work with.

I note, however, that since the RepoDriller idiom is that you are visiting a sequence of commits, it might make sense to have the diff here be between the commit you are visiting and the specified commit. I don't see a nice way to manage that given that CommitVisitor is an interface, not a class for which we can define private state, so offering the more general diff may be the best option.

@mauricioaniche
Copy link
Owner

I do not see Repodriller returning a list of Modification for this diff between commits feature.

I'd rather model it as an utility method, e.g.,List<DiffEntry> getDiffBetween(string hash1, string hash2) or String getDiffBetween(string hash1, string hash2).

Does that make sense?

@davisjam
Copy link
Contributor

Modification or DiffEntry is fine, I'm not sure of the difference yet. Not a String though, per #62: why provide a String when we have better options?

@ayaankazerouni
Copy link
Contributor Author

ayaankazerouni commented Oct 15, 2017

I agree with @davisjam, I'm not certain what the difference would be. DiffEntry is a object from JGit, tying this to to Git repositories, which I understand we don't want. Returning one large String seems kind of unwieldy to me, to be honest.

My intuition is that anyone using this diff will probably turn it into a Modification or DiffEntry or something anyway, instead of interacting with it as a String.

@davisjam
Copy link
Contributor

davisjam commented Oct 15, 2017

Ah, that's why I couldn't find DiffEntry in the RepoDriller source -- it's from JGit! If the proposed getDiffBetween(X, Y) is in SCM then it certainly shouldn't be returning a DiffEntry -- as @ayaankazerouni says, that ties other SCMs to JGit. Doesn't make sense.

Since a CommitVisitor gets an SCMRepository<->SCM, SCM needs to offer an abstract diff type. Review of the source suggests that Modification is basically an SCM-agnostic version of DiffEntry, so it's a good fit for this problem.

A String is the most portable between SCMs, but since a Modification offers a String we've got that covered.

So +1 for returning a List<Modification>. And whatever happens in #62, the improvement to Modification will show up here too.

@mauricioaniche
Copy link
Owner

Sorry for the confusion, I do not know all the class names by heart. It definitely does not make sense to pass JGit's internals to outside. However, our model contains DiffLine. See DiffBlock also.

Modification represents a modification that was made in a commit. It contains old path, new path, type of modification (RENAME, DELETE, ..).

What kind of diff are you talking about ? If you are talking about the text-based source code diff, then, Modification clearly does not make sense. If you are talking about showing the files, the type of modification, etc, etc, then, this entity makes sense.

@ayaankazerouni
Copy link
Contributor Author

If you are talking about showing the files, the type of modification, etc, etc, then, this entity makes sense

This is the type I'm talking about. I would like to interact with the diff the same way as we do in CommitVisitor. It might be that I end up simply getting the Strings, but returning a Modification gives me a choice of what to do without having to parse things myself.

@mauricioaniche
Copy link
Owner

Go for it! Let me know when the PR is done! :)

@ayaankazerouni
Copy link
Contributor Author

Ok PR #90 should be ready for review and merge now.

@davisjam
Copy link
Contributor

LGTM. Nice job.

@mauricioaniche
Copy link
Owner

Thanks! I'll review it soon!! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants