Skip to content

Make invenio-github support other VCS providers #188

@palkerecsenyi

Description

@palkerecsenyi

Parent issue: CERNDocumentServer/cds-rdm#440


This issue is part of inveniosoftware/product-rdm#226 to add a GitLab integration. In order to reduce code duplication, the approach will be to adapt invenio-github and turn it into a "generic" module that supports any Version Control System (VCS) as long as it provides the necessary APIs and functionality. Implementations for specific VCSes like GitHub and GitLab will be provided in new contrib files.

Stage 1

Aiming to complete a fully functional, production-ready, well-documented MVP. We will regard it as complete when:

  • it provides an end-user experience equivalent to the current GitHub integration, preferably as similar as possible.
    • This has some scalability issues but for now we will avoid changing too much
  • a clear migration script and guide are available and have been thoroughly tested with existing Zenodo data
  • unit tests have been updated
  • as many bugs fixed as possible

Work is split between several PRs to make reviewing easier. These are all being merged into the vcs-staging branch (so we can safely have unreleasable code kept separate from master). Once all the PRs of the first stage are merged, we can merge into master with a single squash commit. Before merging into master, a maint-v3.x branch should be created to continue maintaining the pure GitHub old version of this module. The first stage consists of:

To see overall non-fragmented changes of invenio-vcs, please see my fork's master branch.

Todo for stage 1

  • GitLab contrib. This is a priority as it's needed to test a lot of the other features (e.g. auth). It's very difficult to test e.g. OAuth without it.
  • OAuth user ID correlation
    • i.e. if the VCS provider uses the same OAuth server to authenticate the user as the Invenio instance, we should check the user IDs to make sure they match. This is useful for CDS-RDM where users will be able to link CERN GitLab, which uses the same CERN SSO.
    • We could express this through a more versatile hook function that returns whether/not we should accept the authenticated user.
    • Update: This can be done relatively easily by configuring a custom info_serializer handler in invenio.cfg. See the example for CDS: WIP: User ID validator for GitLab CERNDocumentServer/cds-rdm#554
  • Sync VCS repositories straight into the vcs_repositories table instead of the OAuth remote user extra_data field.
    • This will make querying a lot easier so we can paginate/search on the repository list page, which is currently very slow for users on e.g. GitLab instances where they have membership of thousands of repos due to group membership.
  • Check duplication for organisational/team repos if multiple people activate them
    • What happens if a user is deleted? How can we transfer the repos?
  • Repo name should not be unique individually. It is unique as a tuple of (provider_id,provider,name)
  • UI bug with menu not being able to differentiate between multiple dynamically-registered entries
    • For example: image
  • Unit tests
  • Documentation
  • Migration script and guide
  • Careful testing of DB migration for existing GitHub repos/releases
  • Some UI pages have not been adapted and continue to throw errors
  • JSONB extra_data in oauthclient
  • Correct handling of dependency in InvenioRDM
    • invenio-vcs is now an optional dependency, including in InvenioRDM. Whether the integration is enabled depends on whether the dependency is installed.
    • However, some higher level bindings in invenio-app-rdm and invenio-rdm-records perform overrides on classes in invenio-vcs without checking that it's installed. Which causes a crash if it isn't.
    • We need to find a neat way of avoiding this issue

Stage 2

The following features will only be implemented in future PRs once Stage 1 has been fully completed and merged:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions