Skip to content

Conversation

leander-dsouza
Copy link
Collaborator

Basic Info

Info Please fill out this column
Ticket(s) this addresses #148
Primary OS tested on Ubuntu
Is this a breaking change? No
Does this PR contain AI generated software? No

Description of contribution in a few bullet points

  • Added an extends field for vcs import to facilitate inheritance.
  • Added safeguards for avoiding circular imports.
  • Updated the testing framework to account for various edge cases regarding the extends feature.
  • Updated the README to include information on how to use the new field.

Description of how this change was tested

  • Ran pytest locally to ensure all the unit and linting tests pass:

    pytest -s -v test
  • Created a local fork to validate green CI.

Signed-off-by: Leander Stephen Desouza [email protected]

@leander-dsouza leander-dsouza marked this pull request as ready for review August 24, 2025 23:24
@leander-dsouza leander-dsouza force-pushed the leander-dsouza/add-inheritance branch from 898c082 to 3361a90 Compare August 26, 2025 03:03
Copy link
Member

@christophebedard christophebedard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and suggestions.


More of a long-term, high-level discussion: Did you see the discussion on the original PR about being able to extend multiple files and import multiple files and the semantics of that? dirk-thomas/vcstool#148 (comment). How do you think that should behave? Dirk was suggesting not allowing (warning or erroring on) identical entries for sibling files: not allowing two files being extended to contain the same repository path (e.g., src/vcs2l: under repositories:). I think I kind of missed that back then and intended to just merge sibling entries and prioritize from top to bottom instead:

extends:
  - a.repos  # May override entries from b.repos
  - b.repos
repositories:
  repo: ...  # May override entries from the combination of a.repos and b.repos

We can probably figure this out after this PR, though.

@claraberendsen claraberendsen added the enhancement New feature or request label Sep 9, 2025
@leander-dsouza leander-dsouza force-pushed the leander-dsouza/add-inheritance branch from 3361a90 to 8c7697c Compare September 15, 2025 13:53
@leander-dsouza
Copy link
Collaborator Author

leander-dsouza commented Sep 15, 2025

Some comments and suggestions.

More of a long-term, high-level discussion: Did you see the discussion on the original PR about being able to extend multiple files and import multiple files and the semantics of that?

Yes, it was quite straightforward to extend the implementation to support multiple files.
The very last commit of this PR is focused on tackling this.

I have included documentation, tests and error support if the user tries to include duplicate files.

@leander-dsouza
Copy link
Collaborator Author

I can understand that the documentation can take up a lot of space in the README.
I was hoping that upon review of this PR, we can safely migrate this into specific pages in the documentation:

https://ros-infrastructure.github.io/vcs2l

@christophebedard
Copy link
Member

christophebedard commented Sep 17, 2025

More of a long-term, high-level discussion: Did you see the discussion on the original PR about being able to extend multiple files and import multiple files and the semantics of that?

Yes, it was quite straightforward to extend the implementation to support multiple files. The very last commit of this PR is focused on tackling this.

I have included documentation, tests and error support if the user tries to include duplicate files.

Sorry, I meant to initiate a discussion and not really request you to implement something, because, from the initial discussion with Dirk, there were two options when supporting multiple files under extends:, and I wasn't sure what the best option was:

  1. Allow files to include the same repository path (e.g., two files including src/vcs2l: under repositories:), but define precedence, e.g., bottom file takes precedence, like you did
  2. Do not allow files to include the same repositories; report an error

I think Dirk's point was that it makes sense for a file extending another file to override some paths (repos), but it may or may not make sense for a file being extended to override some paths from another file being extended alongside it.

@leander-dsouza
Copy link
Collaborator Author

Sorry, I meant to initiate a discussion and not really request you to implement something

Apologies for misreading the conversation. I can drop the specific commit if this does not pertain to the PR.

I wasn't sure what the best option was:

  1. Allow files to include the same paths, but define precedence, e.g., bottom file takes precedence, like you did
  2. Do not allow files to include the same paths; report an error

I am currently not allowing the user to specify the same path by raising the error here.

I feel that this is a more simplistic approach, which limits the user to specifying distinct paths as part of the extended files.
I believe that by restricting this option, the file hierarchy becomes easier to track and follow.

Please let me know if this is a viable method to follow through, as this file structure can be restrictive and opinionated.

@christophebedard
Copy link
Member

I wasn't sure what the best option was:

  1. Allow files to include the same paths, but define precedence, e.g., bottom file takes precedence, like you did
  2. Do not allow files to include the same paths; report an error

I am currently not allowing the user to specify the same path by raising the error here.

I feel that this is a more simplistic approach, which limits the user to specifying distinct paths as part of the extended files. I believe that by restricting this option, the file hierarchy becomes easier to track and follow.

Please let me know if this is a viable method to follow through, as this file structure can be restrictive and opinionated.

Sorry, my comment wasn't clear, especially without the original context. See my updated comments. By "the same paths," I meant the same repository paths (not .repos files paths), e.g., if two files being extended have src/vcs2l: under repositories:.

In any case, I think allowing identical repository paths/entries is fine for now. I can see having a --strict-extend option later to report an error in case of duplicate repository entries.

…e in import command.

Signed-off-by: Leander Stephen D'Souza <[email protected]>
@leander-dsouza
Copy link
Collaborator Author

Sorry, my comment wasn't clear, especially without the original context. See my updated comments. By "the same paths," I meant the same repository paths (not .repos files paths), e.g., if two files being extended have src/vcs2l: under repositories:.

Oh my bad, I think since we are allowing duplicate repository entries as of now, this should be resolved.
By defining a priority structure (higher priority items stay at the bottom of the list), the ambiguities should nullify, and provide the user with a lot more flexibility.

I can see having a --strict-extend option later to report an error in case of duplicate repository entries.

Yes, sure. I can add this as a feature request issue once this PR gets merged with main.

try:
# Recursively get repositories from parent file
with open(parent_file, 'r', encoding='utf-8') as parent_f:
parent_repos = get_repositories(parent_f, visited_files.copy())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a quick look, this .copy() operation caught my attention here. Is it really required? Can we find a bug that happens in no copy is performed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is required.
This is to ensure that each DFS iteration gets its own copy of visited_files to work with.
This is to prevent unnecessary CircularImportError exceptions from being raised when the extended files have the same dependency.

For instance, consider the following example:

base.yaml
  extends: [a.yaml, b.yaml]

a.yaml
  extends: shared.yaml
  
b.yaml
  extends: shared.yaml

shared.yaml
  (no extends)

Without .copy():

  1. Process base.yaml → adds it to visited_files
  2. Process a.yaml → adds it to the same visited_files set
  3. Process shared.yaml → adds it to the same visited_files set
  4. Return from processing a.yaml
  5. Try to process b.yaml → adds it to visited_files
  6. Try to process shared.yaml again - CircularImportError due to step 3.

TLDR: The .copy() method prevents sibling files with the same dependency from raising a circular import error.

@j-rivero
Copy link

j-rivero commented Oct 3, 2025

Thanks @leander-dsouza for implementing this, quite a nice work of defining the semantics and the implementation. I just left a minor command, overall looks great.

As a proof of concept I used this and encode gazebodistro files for the Jetty distribution gazebo-tooling/gazebodistro#224 and 🎉 almost everything removed but the codification of the library sources and a declaration of their dependencies in each file. Isn't wonderful?

Couple more of comments: for packaging in debian (particularly for the testing) we are going to need "offline" tests, but I saw that @cottsay was already coded a solution. I'm not particularly fan of encoding the shell output for testing, but i think that this is how the repository tests were implemented so it is more a topic for a repository wise discussion.

@cottsay
Copy link
Member

cottsay commented Oct 3, 2025

I'm not particularly fan of encoding the shell output for testing, but i think that this is how the repository tests were implemented so it is more a topic for a repository wise discussion.

I'm also not thrilled about the console output captures. I think we all have a handful of changes we want to make to the testing for this package. Trying to keep #72 focused on getting things offline first, then we can make incremental improvements.

@leander-dsouza
Copy link
Collaborator Author

Trying to keep #72 focused on getting things offline first, then we can make incremental improvements.

I totally agree.
Once the offline tests PR is merged, I will modify this PR to include tests that match the structure drafted in #72.

@asherikov
Copy link

I see the point of this change, but solution is not great imo:

  1. The original goal can be achieved with a simple sequential merge of multiple yaml files. (I doubt that anyone needs to deal with diamond inheritance and other fancy stuff.)
  2. This feature modifies repository list format, possibly breaking compatibility with other tools.
  3. Paths to extensions are hardcoded in repository lists, which may be quite inconvenient in practice.

Accepting multiple repo lists and merging them sequentially seems to be less invasive and more flexible.

@leander-dsouza
Copy link
Collaborator Author

Thank you for the feedback, Alex.
I have discussed some of the concerns that you have raised below:

  1. The original goal can be achieved with a simple sequential merge of multiple yaml files. (I doubt that anyone needs to deal with diamond inheritance and other fancy stuff.)

The implementation in this PR is a sequential merge of multiple YAML files with priority.
If there are no duplicates, then all the repository files are simply merged. If the same repository is shared, then only the last item in the list overrides the initial definition.

If you take how the repository files are defined in gazebo-tooling/gazebodistro#224, having multiple inheritance is a great way to simplify files. Moreover, you can have a single base template of dependencies and create "children" for your cause by merely overriding a particular dependency choice.

  1. This feature modifies repository list format, possibly breaking compatibility with other tools.

The addition of the extends: field is purely optional, and the format of the older definition of the repositories is still intact.

  1. Paths to extensions are hardcoded in repository lists, which may be quite inconvenient in practice.

The paths to the extends: field supports relative paths when using the --input parameter. Hence, mentioning the whole hardcoded path is not necessary for inheritance.

@asherikov
Copy link

  1. Embrace-extend-extinguish? ;) What I mean is that other tools need to be modified as well to support the new logic. In my tool this feature is already implemented by allowing multiple repository lists (https://github.com/asherikov/wshandler)
  2. You are focused on one use case, where extensions are diverging from the same base, but it can be the other way: one extension to multiple bases.

@leander-dsouza
Copy link
Collaborator Author

leander-dsouza commented Oct 9, 2025

  1. Embrace-extend-extinguish? ;) What I mean is that other tools need to be modified as well to support the new logic. In my tool this feature is already implemented by allowing multiple repository lists (https://github.com/asherikov/wshandler)

Apologies for implying the changes are EEE. This was motivated by a feature request mentioned at dirk-thomas/vcstool#148.

Since the tool is regularly used in the community, I appreciate your concern. Could you please provide some examples/scenarios that this incoming change would prove to be inferior to your implementation at wshandler?

  1. You are focused on one use case, where extensions are diverging from the same base, but it can be the other way: one extension to multiple bases.

The tool currently can be extended from multiple bases/repository files. There are no restrictions on the amount of extensions you can have for your implementation.

@asherikov
Copy link

I am sorry I wasnt clear:
the proposed solution works on

a.yaml
  extends:  base.yaml  
base.yaml
  (no extends)

but what if you need

a.yaml
  extends: {base1.yaml or base2.yaml}
base1.yaml
  (no extends)
base2.yaml
  (no extends)

The issue here is that extended files are hardcoded.

@leander-dsouza
Copy link
Collaborator Author

leander-dsouza commented Oct 9, 2025

but what if you need

a.yaml
  extends: {base1.yaml or base2.yaml}
base1.yaml
  (no extends)
base2.yaml
  (no extends)

Multiple inheritance is supported in this contribution. For your above use case, the syntax is as follows (example file):

---
extends:
  - base_1.yaml  # Lower priority
  - base_2.yaml  # Higher priority
repositories:
  vcs2l:
    type: git
    url: https://github.com/ros-infrastructure/vcs2l.git
    version: 1.1.5

Could you please clarify what do you mean by:

  • extends: {base1.yaml or base2.yaml} - Does this mean both files are imported? , or does it mean that if the base1.yaml path does not exist, then base2.yaml is imported? I wanted to clarify if it is a logical OR operation.

  • The issue here is that extended files are hardcoded. - Do you mean the file path to the extended files is hardcoded? , as the PR supports relative paths.

You can refer to some of the tests attached for each possible use case as well.

@asherikov
Copy link

asherikov commented Oct 9, 2025

extends: {base1.yaml or base2.yaml}

Either base1 or base2, e.g., two versions of a package stack maintained by different teams or targeting different platforms are extended by the same patch provided by a third party.

PR supports relative paths.

It is not about relative or absolute paths: consider a situation where repolists are located in separate repos that are checked out in arbitrary locations. Assumption that both the base and extending repo lists are distributed together is too strong.

@claraberendsen
Copy link
Contributor

@asherikov thanks for explaining a bit further, I'm trying to understand the use cases / potential drawbacks you are presenting here:

Either base1 or base2, e.g., two versions of a package stack maintained by different teams or targeting different platforms are extended by the same patch provided by a third party.

The current implementation supports extending from multiple bases like you comment and merges yaml files by taking the latest appearance of a specific repo. The updated README has a detailed write up on how this is structured. Could you exemplify why this wouldn't work for your use case ?

It is not about relative or absolute paths: consider a situation where repolists are located in separate repos that are checked out in arbitrary locations. Assumption that both the base and extending repo lists are distributed together is too strong.

This is a limitation of the current implementation as I understand it (correct me if I'm wrong @leander-dsouza). You cannot input an extension repolist that is not distributed in the same repo you are defining the extension, there is a locality concept to the implementation. I see the value of being able to extend from external sources when compositing your repolist, though I'm not certain where the balance falls between the added complexity and value.

Accepting multiple repo lists and merging them sequentially seems to be less invasive and more flexible.

This was discussed in the OG issue and what I think is lost by just allowing multiple repolists on --input is the declaration of that relationship. Declaring an extension in the repolist removes a layer of needed knowledge to replicate the same outcome. Using the extends: key maintains the same concept of the current implementation: the repolist defines all repositories that need to be checkout while reducing duplication and maintenance burden.

@asherikov
Copy link

The current implementation supports extending from multiple bases like you comment and merges yaml files by taking the latest appearance of a specific repo. The updated README has a detailed write up on how this is structured. Could you exemplify why this wouldn't work for your use case ?

Nope, this is a different use case. I dont mean to extend two bases simultaneously, but rather to extend either one or another, with the current implementation you would have to either maintain two versions of extension file, or edit extends manually every time you apply it.

This was discussed in the OG issue and what I think is lost by just allowing multiple repolists on --input is the declaration of that relationship.

Ordering quite naturally implies priority, same as in your example:

extends:
  - base_1.repos  # Lower priority
  - base_2.repos  # Higher priority

If necessary an extra parameter may specify how conflicts are to be handled, e.g., fail, prioritize last/prioritize first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants