Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trusted publishing: support for GitLab CI #13575

Closed
2 tasks done
woodruffw opened this issue May 4, 2023 · 20 comments
Closed
2 tasks done

Trusted publishing: support for GitLab CI #13575

woodruffw opened this issue May 4, 2023 · 20 comments

Comments

@woodruffw
Copy link
Member

woodruffw commented May 4, 2023

Per #12465 (comment): GitLab now supports a customizable aud for their CI-issued identity tokens, meaning that it should be possible to integrate them as a provider of trusted publishers!

Some initial tasks:

  • Extract some GitLab OIDC tokens and inspect their claim set
  • Determine the appropriate set of fields/claim constraints to expose to users

From there, the actual development tasks on this should look similar to the tasks enumerated in #13551.

@facutuesca
Copy link
Contributor

I'm taking a look at this

@woodruffw
Copy link
Member Author

Thanks @facutuesca! I've assigned you.

@facutuesca
Copy link
Contributor

Extract some GitLab OIDC tokens and inspect their claim set

Here's one:

{
  "namespace_id": "12885807",
  "namespace_path": "user",
  "project_id": "53804090",
  "project_path": "user/oidc_test",
  "user_id": "9420787",
  "user_login": "user",
  "user_email": "[email protected]",
  "pipeline_id": "1136338956",
  "pipeline_source": "push",
  "job_id": "5919171663",
  "ref": "main",
  "ref_type": "branch",
  "ref_path": "refs/heads/main",
  "ref_protected": "true",
  "runner_id": 12270807,
  "runner_environment": "gitlab-hosted",
  "sha": "0a037033c5d88a5e5fd4d0658740c7a952b4f317",
  "project_visibility": "private",
  "ci_config_ref_uri": "gitlab.com/user/oidc_test//.gitlab-ci.yml@refs/heads/main",
  "ci_config_sha": "0a037033c5d88a5e5fd4d0658740c7a952b4f317",
  "jti": "1a78bc43-842b-42bc-994c-25c12124af5e",
  "iss": "https://gitlab.com",
  "iat": 1705066514,
  "nbf": 1705066509,
  "exp": 1705070114,
  "sub": "project_path:user/oidc_test:ref_type:branch:ref:main",
  "aud": "pypi"
}

The fields are documented in https://docs.gitlab.com/ee/ci/secrets/id_token_authentication.html#token-payload.

Since the iss will be whatever the URL of the GitLab instance is, I'm assuming that for this provider we're not supporting self-hosted instances, right? That is, for now only "iss": "https://gitlab.com" will be supported

As for which claims we want to expose to the user to configure the publisher, these would be the ones that best match the existing GitHub publisher:

{
  "project_path": "user/oidc_test",
  "ci_config_ref_uri": "gitlab.com/user/oidc_test//.gitlab-ci.yml@refs/heads/main",
  "environment": "myenvironment",
}

(environment is optional, and similar to GitHub's environments). Any thoughts on this? @woodruffw @di

@woodruffw
Copy link
Member Author

That is, for now only "iss": "https://gitlab.com" will be supported

Correct, similar to GitHub with GHE (which we don't support for now) 🙂

As for which claims we want to expose to the user to configure the publisher, these would be the ones that best match the existing GitHub publisher:

Yeah, these look good to me -- we'll also want to think about binding user_id, similar to what we do for GitHub (the user gives us their name, and we use the GitHub API to retrieve the backing ID).

Looks like this is probably the relevant endpoint: https://docs.gitlab.com/ee/api/users.html#list-users -- I confirmed that https://gitlab.com/api/v4/users?username=yossarian retrieves the ID we want.

@facutuesca
Copy link
Contributor

Yeah, these look good to me -- we'll also want to think about binding user_id, similar to what we do for GitHub (the user gives us their name, and we use the GitHub API to retrieve the backing ID).

@woodruffw There is an issue with this. On GitHub, the owner of a repository (either a user or an organization) is always public, so you can get the ID through their API. On GitLab though, groups (the equivalent of GH organizations) can be set to private, in which case you cannot get their ID through the API.

So, for example, if I want to configure a repo located at gitlab.com/my_group/my_repo and my_group is set to private, we cannot determine its ID to later verify against an OIDC token.

The same issue happens with project_path and project_id: if a project is set to private, we cannot determine its ID based on its owner and the project name.

As a side note, the user_id field in GitLab's token refers to the user that triggered the workflow, not the owner of the repo.

@woodruffw
Copy link
Member Author

Thanks for investigating @facutuesca!

Copying from chat: given the above, we can move forward without binding the user/project/etc. IDs. We should re-investigate once we get closer to merging this, just to make sure we haven't missed anything.

Separately, we'll probably want to document these restrictions on the "Security Model" in the GitLab section, to clarify that the account resurrection defense methods we use for GitHub don't work for GitLab.

@facutuesca
Copy link
Contributor

facutuesca commented Jan 18, 2024

Another difference with GitHub is that GitLab allows for creating CI/CD workflows using YAML files located in other repos, or even in arbitrary URLs (e.g: https://my_website.com/workflow.yml). In these cases where the workflow file is not located in the repo running the CI/CD action, the OIDC token provided by GitLab will not include the workflow file path (the ci_config_ref_uri claim):

The ref path to the top-level pipeline definition, for example, gitlab.example.com/my-group/my-project//.gitlab-ci.yml@refs/heads/main. Introduced in GitLab 16.2. This claim is null unless the pipeline definition is located in the same project.

So for our GitLab publisher, I see three possible options:

  1. Make the Workflow name field optional, telling the user it's only valid if the workflow file is in the same repo as the one that runs the CI action.
  2. Leave the Workflow name field as required, and tell the user that a GitLab publisher must have its workflow file in the same repo as the CI action (this would mean we don't support trusted publishing with CI workflow files that are not in the repo)
  3. Remove the Workflow name field altogether.

@woodruffw thoughts?

@woodruffw
Copy link
Member Author

Huh, that's a surprising feature. Does that mean they fetch arbitrary workflow definitions over HTTPS during CI/CD runs?

I think option (2) makes the most sense here, at least for an MVP -- removing Workflow name as a requirement means (IIUC) that any configured workflow file in the repository could potentially act as the trusted publisher, which is probably not what most OSS projects want.

@facutuesca
Copy link
Contributor

facutuesca commented Jan 19, 2024

Huh, that's a surprising feature. Does that mean they fetch arbitrary workflow definitions over HTTPS during CI/CD runs?

Yeah, I tried it with a yml file hosted as a GitHub gist, and it did fetch and use it.

A related question is: GitHub workflows are always under .github/workflows/, so the GitHub trusted publisher only asks for the workflow file name (e.g: workflow.yml). Since all workflow files must be under .github/workflows/ (with no subfolders allowed), only specifying the filename is unambiguous: there is no other workflow.yml file that could have been run.

But for GitLab, which allows putting the workflow files anywhere in the repo, the trusted publisher config could ask for the workflow path relative to the repo root (e.g ci/my_subfolder/my_workflow.yml). Given that there might be multiple workflows with the same filename, only asking for the filename is ambiguous. I think using the full path is the way to go. WDYT? @woodruffw

@woodruffw
Copy link
Member Author

Given that there might be multiple workflows with the same filename, only asking for the filename is ambiguous. I think using the full path is the way to go.

Yes, that sounds right to me -- the publisher configuration should unambiguously reference a specific workflow file.

@woodruffw
Copy link
Member Author

Just saving some additional context here: one thing we'll want to investigate is whether GitLab's CI/CD OIDC IdP can give us a job_name or similar claim, so that we can scope the publisher down to not just a single workflow definition but all the way down to a single job within the workflow.

For example, ideally we'd be able to capture publish_to_pypi as a claim here:

publish_to_pypi:
  id_tokens:
    PYPI_ID_TOKEN:
      aud: pypi
  script: ...

@facutuesca
Copy link
Contributor

facutuesca commented Feb 20, 2024

Following the discussion on #15275, one thing we could add is namespace ID validation:
When the user is setting up a GitLab repo as a trusted publisher, we (PyPI) can ask GitLab's API if the provided namespace exists or not (like we do for GitHub users/orgs). If it does, we can store its ID, and use that ID to verify future OIDC tokens. This matches what we do for GitHub in order to prevent account resurrection attacks.

One caveat to this is that it would only work for public GitLab namespaces. GitLab namespaces can be either usernames or groups/subgroups, and since groups can be private, there is no way of getting a private group's ID using the public API. This means that when setting a GitLab trusted publisher, we would only be able to store the namespace ID of public namespaces.

However, there seems to be a limitation in GitLab's API: I can't find a way to get the namespace ID for a public personal namespace. For example, https://gitlab.com/example-user is a user and has a personal namespace example-user, and while I can get the user ID using the API:

$ curl  "https://gitlab.com/api/v4/users/?username=example-user"
[{"id":5866417,"username":"example-user", #....

I cannot get the namespace ID, since namespace information can only be accessed through the API if you own it:

$ curl --header "PRIVATE-TOKEN: $GITLAB_TOKEN" "https://gitlab.com/api/v4/namespaces/example-user/"
{"message":"404 Namespace Not Found"}%

This is in contrast with groups, where the ID returned by the groups API is also the namespace ID:

$  curl "https://gitlab.com/api/v4/groups/example-group"
{"id":91639," #....

In short, there doesn't seem to be a way of getting the namespace_id of a (public) personal namespace using GitLab's API, which means we can't store it during trusted publisher setup and afterwards use it to verify GitLab OIDC tokens.

@di
Copy link
Member

di commented Feb 22, 2024

Update: I've enabled GitLab trusted publishing support on https://test.pypi.org/, next steps are to get the screenshots in #15283 updated.

@facutuesca
Copy link
Contributor

facutuesca commented Feb 23, 2024

I have successfully used GitLab CI/CD to upload a package to TestPyPI using Trusted Publishing:

Image

I have updated the instructions on how to do it in the docs, to make them clearer and fix a small error. If anyone wants to try it on their own, here's an example .gitlab-ci.yml to do it:

build-job:
  stage: build
  image: python:3-bookworm
  script:
    - python -m pip install -U build
    - python -m build
  artifacts:
    paths:
      - "dist/"

publish-job:
  stage: deploy
  image: python:3-bookworm
  dependencies:
    - build-job
  id_tokens:
    PYPI_ID_TOKEN:
      # Use "testpypi" if uploading to TestPyPI
      aud: testpypi
  script:
    # Install dependencies
    - apt update && apt install -y jq
    - python -m pip install -U twine id

    # Retrieve the OIDC token from GitLab CI/CD, and exchange it for a PyPI API token
    - oidc_token=$(python -m id PYPI)
    # Replace "https://pypi.org/*" with "https://test.pypi.org/*" if uploading to TestPyPI
    - resp=$(curl -X POST https://test.pypi.org/_/oidc/mint-token -d "{\"token\":\"${oidc_token}\"}")
    - api_token=$(jq --raw-output '.token' <<< "${resp}")

    # Upload to PyPI authenticating via the newly-minted token
    # Add "--repository testpypi" if uploading to TestPyPI
    - twine upload --repository testpypi -u __token__ -p "${api_token}" dist/*

@di
Copy link
Member

di commented Feb 23, 2024

There were two additional claims that the OIDC token had that we hadn't accounted for: #15466

@di
Copy link
Member

di commented Mar 7, 2024

Docs have been added in #15192, last thing to do here is to include this in the blog post (and potentially coordinate with GitLab on an announcement).

@matthewfeickert
Copy link

Since the iss will be whatever the URL of the GitLab instance is, I'm assuming that for this provider we're not supporting self-hosted instances, right? That is, for now only "iss": "https://gitlab.com" will be supported

Scanning the rest of this Issue quickly I didn't see a direct reply on this (apologies if I missed it). I don't have any technical experience on this issue, so is self-hosted GitLab instances something that would be feasible to support in the future? I'm specifically interested in CERN's GitLab instance (c.f. di/id#216) as there are multiple projects there that publish to PyPI where we'd like to transition to using Trusted Publishers.

@di
Copy link
Member

di commented Apr 23, 2024

I think it would be technically possible to support a self-hosted instance like this (I see that https://gitlab.cern.ch/.well-known/openid-configuration and https://gitlab.cern.ch/oauth/discovery/keys are both publicly available, which is all we need for verification), the real question is what the process by which we would add support for all these one-off issuers.

I think one thing we could do here would be to allow the user to optionally configure the iss field as well. My main concern would be that this would allow anyone to essentially masquerade as a GitLab instance, and publish from anywhere, which would give me a little less confidence in the security of the publish event as a user.

Another option is that we allow-list certain issuers for projects in certain organizations, and manually handle these on a case-by-case basis.

Open to ideas though!

@di
Copy link
Member

di commented Apr 23, 2024

Moving the discussion about self-hosted instances here: #15838

Closing this since we have launched support! https://blog.pypi.org/posts/2024-04-17-expanding-trusted-publisher-support/

@di di closed this as completed Apr 23, 2024
@facutuesca
Copy link
Contributor

facutuesca commented Apr 23, 2024

@matthewfeickert the reply for that particular point was in this comment. As for your question, I believe it is technically feasible. It would mean adding a new field to the Trusted Publisher configuration, where you specify the issuer_url (the URL of your self-hosted instance). That URL would then used to verify the claims, but also to pull the signing keys used to verify the signature in the OIDC token (in CERN's case, the ones here).

The question is if PyPI wants to accept arbitrary OIDC publisher URLs, since this brings up both security concerns but also maintenance (now PyPI has to deal with all issues related to a particular self-hosted instance's misconfiguration)

edit: ah I see @di beat me to it 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants