-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Parent issue: CERNDocumentServer/cds-rdm#440
This issue is part of inveniosoftware/product-rdm#226 to add a GitLab integration. In order to reduce code duplication, the approach will be to adapt invenio-github
and turn it into a "generic" module that supports any Version Control System (VCS) as long as it provides the necessary APIs and functionality. Implementations for specific VCSes like GitHub and GitLab will be provided in new contrib files.
Stage 1
Aiming to complete a fully functional, production-ready, well-documented MVP. We will regard it as complete when:
- it provides an end-user experience equivalent to the current GitHub integration, preferably as similar as possible.
- This has some scalability issues but for now we will avoid changing too much
- a clear migration script and guide are available and have been thoroughly tested with existing Zenodo data
- unit tests have been updated
- as many bugs fixed as possible
Work is split between several PRs to make reviewing easier. These are all being merged into the vcs-staging
branch (so we can safely have unreleasable code kept separate from master
). Once all the PRs of the first stage are merged, we can merge into master
with a single squash commit. Before merging into master, a maint-v3.x
branch should be created to continue maintaining the pure GitHub old version of this module. The first stage consists of:
- feat(vcs): rename
github
references tovcs
#191 - feat(vcs): new data model #192
- feat(vcs): generic provider interface + contrib implementations #193
- feat(vcs): service layer #194
- feat(vcs): view handlers #195
- docs: start writing upgrade guide #196
- tests(vcs): compatibility for invenio-vcs #199
- WIP: compat for new VCS integration invenio-rdm-records#2128
- config: compat with new VCS integration invenio-app-rdm#3162
To see overall non-fragmented changes of invenio-vcs
, please see my fork's master branch.
Todo for stage 1
- GitLab contrib. This is a priority as it's needed to test a lot of the other features (e.g. auth). It's very difficult to test e.g. OAuth without it.
- OAuth user ID correlation
- i.e. if the VCS provider uses the same OAuth server to authenticate the user as the Invenio instance, we should check the user IDs to make sure they match. This is useful for CDS-RDM where users will be able to link CERN GitLab, which uses the same CERN SSO.
- We could express this through a more versatile hook function that returns whether/not we should accept the authenticated user.
- Update: This can be done relatively easily by configuring a custom
info_serializer
handler ininvenio.cfg
. See the example for CDS: WIP: User ID validator for GitLab CERNDocumentServer/cds-rdm#554
- Sync VCS repositories straight into the
vcs_repositories
table instead of the OAuth remote userextra_data
field.- This will make querying a lot easier so we can paginate/search on the repository list page, which is currently very slow for users on e.g. GitLab instances where they have membership of thousands of repos due to group membership.
- Check duplication for organisational/team repos if multiple people activate them
- What happens if a user is deleted? How can we transfer the repos?
- Repo name should not be unique individually. It is unique as a tuple of (provider_id,provider,name)
- UI bug with menu not being able to differentiate between multiple dynamically-registered entries
- Unit tests
- Documentation
- Migration script and guide
- Careful testing of DB migration for existing GitHub repos/releases
- Some UI pages have not been adapted and continue to throw errors
- JSONB extra_data in oauthclient
- Correct handling of dependency in InvenioRDM
invenio-vcs
is now an optional dependency, including in InvenioRDM. Whether the integration is enabled depends on whether the dependency is installed.- However, some higher level bindings in
invenio-app-rdm
andinvenio-rdm-records
perform overrides on classes ininvenio-vcs
without checking that it's installed. Which causes a crash if it isn't. - We need to find a neat way of avoiding this issue
Stage 2
The following features will only be implemented in future PRs once Stage 1 has been fully completed and merged:
- Refresh token support
- In the existing GitHub impl we use access tokens which are non-expiring by default. This is a security issue in case of a database leak and is recommended against by RFC 6749.
- A PR exists (OAuth2 Token refresh implemented invenio-oauthclient#328) but needs some more work (last commit May 2024)
- Support for private repositories
- "Link-only" OAuth without an option to "sign in with" a remote
- React + API-based UI for pagination/search of repos, using OpenSearch
- See GitLab integration product-rdm#235 (reply in thread) for details\
- Notifications on successful archive GitLab integration product-rdm#226 (comment)
- Selecting community to directly publish the repo to (especially on community-mandatory instances): GitLab integration product-rdm#235 (comment)
- Propagate permissions so users who have access to a repo also have access to records created from releases
Metadata
Metadata
Assignees
Labels
Type
Projects
Status