Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release process is misleading for images published #6517

Open
polarathene opened this issue Sep 15, 2024 · 9 comments
Open

Release process is misleading for images published #6517

polarathene opened this issue Sep 15, 2024 · 9 comments
Assignees
Labels

Comments

@polarathene
Copy link

Apologies if this all comes off a bit blunt btw, I don't mean any disrespect 😅

I've spent the time to share this information for the benefit of Tyk and it's users. Hopefully some of the feedback will be appreciated :)

Biggest concern is what looks like regression with published tags on CI for bug fix releases. It looks like those may not have released the intended fixes correctly, and disrupted semver management of v5.


Due to verbosity, here's a quick summary:

  • Various UX concerns raised, notably with Tyk docs and outdated conventions for Docker Compose.
  • :latest tag is approx 3 years old (Tyk Gateway 3.0).
  • Some mismatch going on with GH release publishing, tagged commits, and image builds from CI?
    • v5 tag points to v5.3.4 not v5.5.0
    • v5.3.4 shares the same tagged commit as v5.3.3. Seems like an accident by @ilijabojanovic ?
    • Both of those "newer" releases are considered older than v5.3.2 tagged commit (August 2nd, same as release date, but image was published in June 6th for this May commit according to image metadata).
    • v5.3.4 was tagged 3 weeks ago, with the image built and published to DockerHub at this time, while v5.3.3 retains it's old build date (August 2nd, image says Aug 1st but that'd be TZ difference) according to docker inspect, so no build job triggered? 🤔
    • I have not inspected the equivalent image tags at other image registries, the release workflows are inconsistent along with branch management. They may also be affected.

Beyond that there was some feedback/tips for release workflow you could adopt 👍

  • Consider zig cc for build benefits and better control over glibc support.
  • Ensure you don't have outdated / inconsistent assumptions in workflows.
    • There is some test logic depending on Tyk Gateway 3.0 and you're using mixed versions of actions (like docker/build-push-action).
    • Some flows look like they could do with better inline documentation comments to distinguish context.
  • Move labels to actual Dockerfile and use ARG. DRY your CI and build logic where possible, especially due to the release branches.
  • Modularize the release workflow out into smaller workflows. If you're unfamiliar with this look at this reference as an entrypoint workflow that delegates to re-usable workflows. You will see that project has several workflows that will get triggered and re-use the same generic-* workflows that always target the main branch. You can change the ref target or co-locate legacy workflows on a dedicated actions/ci branch, the workflow should still run from the original caller branch and only improve the workflow maintenance.

UX concerns

Originally I wanted to see the Tyk binary published on GH releases, but that doesn't seem to exist.

Perhaps it's easily available somewhere else, but the docs did not communicate that well. I'm fine with Docker so gave that a look and was a bit more surprised / confused with what I observed.

Local dev vs demo image + config

Curious I inspected the main Dockerfile for this repo but it didn't seem to match DockerHub, instead that seems to belong to the repo docker-compose.yml and that is discouraged in favor of a separate repo for the compose config. Probably helpful to inline a comment on the mentioned files in this repo to direct users there as well (we don't always go through the README first unfortunately 😓 )

Not aligning with Docker upstream conventions introduced well over a year ago aside, that separate repo refers to an image at a registry that differs from DockerHub?

  • These docs reference DockerHub initially but also use that same docker.tyk.io registry in the example that follows.
  • That docs example uses :latest (which is understandable for docs), but as mentioned below it suffers the same problem as DockerHub for that tag. Likewise the docs have not been updated to reflect the newer upstream Docker conventions.

Actual image used to publish to DockerHub

Back to this repo, I was still curious about the actual image being built for DockerHub. Looked through the release workflow and identified it varied a bit.

There are some inline Dockerfile, while others refer to separate files like this one. I detail concerns regarding this further below.

Docs

The docs UX had some concerns too, but this report is already covering a lot for maintainers to digest as-is, so I'll stop here. I was curious about the OSS Tyk, but found the docs to be a bit discouraging for what I assume is a decent product. I'm not sure if this is just the OSS version being low priority and neglected in these areas, or a tactic to discourage it rather than attract users to adopt Tyk that later switch to the on-prem or SaaS offerings. Definitely needs some love/polish if the latter is intended instead of intentional bad UX.


Release workflow observations

You can add these labels to the Dockerfile and populate dynamic values via ARG:

labels: "org.opencontainers.image.title=tyk-gateway (distroless) \norg.opencontainers.image.description=Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols\norg.opencontainers.image.vendor=tyk.io\norg.opencontainers.image.version=${{ github.ref_name }}\n"


Without looking too much into it, you have a CI test on a pinned v3 release of Tyk (there is more than one occurrence of this), that was over 2 years ago, is it intentional for the test to stay on v3, or was it intended for that to be updated with new releases of Tyke Gateway?

RUN curl -fsSL https://packagecloud.io/install/repositories/tyk/tyk-gateway/script.deb.sh | bash && apt-get install -y tyk-gateway=3.0.8


Publishing issues

I'm assuming the below concerns are low priority for Tyk, I've not compared to your other registries where your publishing logic differs a bit. That said it's not good that semver publishing is clearly broken in your CI pipeline (due to patch release).

:latest tag - Please fix

FWIW, you have a :latest tag (which will be used implicitly as the default when none is provided) that was last updated approx 3 years ago:

image

Either remove that tag or keep it updated with the actual latest release.

The DockerHub README doesn't seem to be tailored to the image itself, but rather aligning with the Github README. There's no mention about the :latest tag concern and that it should be avoided at present.


v5 tag regression - Incorrectly points to older minor release (and other oddities)

As release v5.3.4 is newer than v5.5.0 release publish date wise, the CI has published that point release as the latest v5 semver tag which is inaccurate, it should be v5.5.0:

image

image

Presumably that tag is unreliable whenever you're making point releases. CI needs to know only to update major tag when the latest minor is updated (new minor or point release for that minor release).

Related - GH Releases (less important and less actionable AFAIK)

This also affects the GH release latest URL (https://github.com/TykTechnologies/tyk/releases/latest), which presently redirects to releases/tag/v5.3.4. That's unrelated to the workflow and I'm not sure if there's much you can do about that behaviour. The main releases/ URL seems to sort by commit date, not release date from what I can see.

image

So on the actual releases page v5.5.0 is the top shown result as it's tagged with the newest commit date, despite being released over a month ago (no complaints there). Meanwhile on the main repo page the sidebar shows the last made release and the implicit latest tag from GH (again no complaints, just documenting the difference):

image

v5.3 series wrong commits tagged for publishing?

That tag is for a commit from July 25 2024 for which 5.3.4 was tagged as release, but the release itself was published on August 27 2024:

image

image

In fact v5.3.3 and v5.3.4 were both tagged with the exact same commit for the 5.3 release branch, but v5.3.2 was tagged with a newer commit date:

image

Since when does v5.3.2 release after v5.3.4?? It should be sequential releases with the higher point release as newer than their prior commits? Something seems off there?

v5.3.2 does appear to have older RC releases from May 2024:

image

Here you can again see tags sharing commits along with v5.3.3 + v5.3.4 development tags for commits newer than those that were tagged for publishing.

image

NOTE: They are prior to the official v5.5.0 tag release, and that the v5.3.5-alpha1 is from Aug 29 (2 days after the 5.3.3 + 5.3.4 GH release was made for the tagged July 25 commit). It definitely seems unintentional and something went wrong with the release process?

Docker image v5 metadata for confirmation

Inspect the v5 image published to DockerHub presently has the following labels further confirming the concern:

"Id": "sha256:1a5cfdac46a1375dcd5a847b2d9cc844b44dfd72a131d9f01f2fadf54c642268",
"RepoTags": [
  "tykio/tyk-gateway:v5"
],
"RepoDigests": [
  "tykio/tyk-gateway@sha256:8fb0f9b9d911b8bef254b80990e4ee331a6d91ba07315559baf5f743e1313059"
]
"Labels": {
  "org.opencontainers.image.created": "2024-08-26T15:25:53.168Z"
  "org.opencontainers.image.revision": "4e82e7da10e922b28764da399edcd80f93f0f0b8"
  "org.opencontainers.image.version": "v5.3.4"

Inconsistencies between images published and release workflow updates (modularize workflow and use @<branch> refs to avoid)

These v5.3 / v5 images were built with this Dockerfile (hence the larger size on DockerHub despite newer publish dates) while the v5.5 image was built with the newer and slimmer Google distroless base as the final stage via a new Dockerfile contributed in either May or June (that release workflow is a mess and should be split to use re-usable workflows instead for modularity, @alephnull contributed them so is probably more familiar with what is going on).

EDIT: The v5.3.3 / v5.3.4 tags also had the May distroless update, but the release workflow reverted back to the old Distroless image for builds (no reason mentioned in the PR notes or review process, but presumably intentional? Otherwise modularize your workflow and version it accordingly instead of bug prone sync issues from divergence of branches impacting CI).

Commit history nit

The commits for the June update also seem odd if they were all meant to be named like that with separate PRs:

image

zig cc suggestion for avoiding glibc support woes

While browsing the commits, you also have this one citing a concern with a glibc problem, forcing you to roll back from Debian Bookworm (12) to Debian Bullseye (11).

If you're unable to do a static build of Tyk, then I would suggest you look at adopting zig cc for the external dependencies to your Go app.

It'll allow you to pin the glibc version to a min supported API for compatibility when you need it so long as it's possible (that said do benchmark/profile, as lowering too far back for broader compatibility may opt-out of newer API calls that perform better).

@buger
Copy link
Member

buger commented Sep 15, 2024

Thats gold, thank you! It will take some time to process it :)

Re versioning and latest. It was hard for us to decide on this, as it has mulitple procs and cons.
Keep in mind that we do maintain LTS versions (which is currently 5.3.X and 5.0.X), and usually recommended options for the enterprise customers), thats why some tags point to it.

@polarathene
Copy link
Author

Re versioning and latest. It was hard for us to decide on this, as it has mulitple procs and cons.

Yes I'm well aware of the problems with :latest tag and anyone deploying images for production should know the risks of that.

The problem is you have a :latest tag on the registry and that will be pulled by default if no tag is specified, yet it's 3 years old. Since the Tyk OSS docs have instructions that document pulling the image with :latest tag, if the user follows that instead of the example repo (which pins to the latest stable release), then they'll be following the guide from Tyk Gateway 3.0.

I don't think that benefits anyone, either causing the user to be frustrated and unaware why they might run into problems, silently disengaging from Tyk to evaluate other products, or your support teams need to account for that with any troubleshooting offered? I guess if it were a visible problem you'd be more aware of that by now, so either the users aren't following those docs, or they're silently giving up, or they're competent enough to troubleshoot and notice the issue themselves.

If DockerHub allows for dropping the :latest tag, perhaps remove that. Otherwise I would suggest keeping it updated. It'll likely have the same issue as the v5 tag currently does, but that's technically better than leaving it pinned to v3.0.


Keep in mind that we do maintain LTS versions (which is currently 5.3.X and 5.0.X), and usually recommended options for the enterprise customers), thats why some tags point to it.

Sorry I don't quite get what you meant here? Just so we're on the same page:

  • Continuing to maintain older releases with point releases is totally ok for products like Tyk 😁
  • The tag v5.0 and v5.3 would always be updating to point to the latest point release of those release branches.
  • v5 (major only) should resolve to the latest minor release of that series, so presently that should be v5.5.0, but something in your release workflow is unintentionally updating that tag when a new point release is made. Non-issue for new minor releases since those will always be a bump up and thus v5 should point to that (and it's subsequent point releases until another new minor release is tagged and so forth).

One way that you can probably avoid the v5 tag issue is to take the advice for splitting up the release workflow.

Have your release branches call the docker metadata tag + publish jobs from @master or a dedicated actions branch, with an input to opt-out of the major tag? Or if it's easier in the meantime, keep the duplicated workflow as-is and remove the major tag line since previous minor releases should not need to update the major tag.

You'll just need to remember to do this each time a new minor is published and you branch off. Actually, you could probably determine this with a better check 🤔 Yeah I think you could automate it by sorting/comparing git tags semver wise, there are actions for this or you can implement the logic yourself. That way there's no need to toggle an opt-in/opt-out input for the major tag publishing, nor separate workflow copies on release branches, just use that check as a conditional 👍

@alephnull
Copy link
Contributor

Thanks for the detailed audit of the releng code! Restricting myself to the bits that I know.

v5 tag points to v5.3.4 not v5.5.0

Yes, this can be considered a bug. Due to our somewhat idiosyncratic release versioning this tag does not always DTRT. We should probably stop pushing this tag. The v5.{0,3,5} remove ambiguity and you probably want to use them. You certainly do not want a tag that jumps between versions that may break compatibility.

v5.3.4 shares the same tagged commit as v5.3.3

I'm not sure why the UI shows it as identical commits. v5.3.3 is on 9eb415d and v5.3.4 is on eadbffd.

you have a CI test on a pinned v3 release of Tyk (there is more than one occurrence of this), that was over 2 years ago, is it intentional for the test to stay on v3, or was it intended for that to be updated with new releases of Tyke Gateway?

That version (3.0.8) was the last release before the current releng started operating so we want to make sure that we can upgrade the package from that known version.

To clarify a few things:

  • distroless originally landed as a separate series of tags (prefixed with s instead of v) on v5.4. After finding them backward compatible, the s series of tags was retired and the v series became distroless
  • the final implementation of distroless was backported to v5.3 (LTS)
  • ./Dockerfile is (was?) used by developers for local testing. ci/Dockerfile.distroless is the image that is pushed to dockerhub
  • docker.tyk.io is another registry that contains the same images that are pushed to Docker Hub. The tagging is identical. They are used for the Helm charts IIRC.

Despite the confusion caused by the v5 tag, compounded by outdated documentation, it does not appear that any of the published image contents are incorrect. Just pretend the v5 tag does not exist.

@polarathene
Copy link
Author

You certainly do not want a tag that jumps between versions that may break compatibility.

v5 would be similar to latest but scoped to the major release, similar to v5.5 scoping to major minor. The only issue is your release workflow needs adjustment to be compatible so it doesn't regress to releases from earlier minors in the v5.x series.

Some software like Watchtower for example will only monitor for changes to a semver tag like v5 or v5.5 to be notified of updates to that tag which generally would be the next minor or patch release.

Since minors and patch parts of a semver tag should not introduce breaking changes, this is normally not something to worry about. It's only the regression bug that makes it problematic.


v5.3.4 shares the same tagged commit as v5.3.3

I'm not sure why the UI shows it as identical commits. v5.3.3 is on 9eb415d and v5.3.4 is on eadbffd.

Your links are 404 for me. Probably only visible to your team members, so I cannot inspect to compare. Note that the links you provided point to Tyk Analytics not Tyk Gateway if that makes any difference.

It's not just the UI. your CI published these images for those tags with that commit:

$ docker run --rm -it tykio/tyk-gateway:v5.3.4 version --json

{
    "Version": "5.3.4",
    "BuiltBy": "goreleaser",
    "BuildDate": "2024-08-26T15:17:54Z",
    "Commit": "4e82e7da10e922b28764da399edcd80f93f0f0b8",
    "Go": {
        "Os": "linux",
        "Arch": "amd64",
        "Version": "go1.21.13"
    }
}
$ docker run --rm -it tykio/tyk-gateway:v5.3.3 version --json

{
    "Version": "5.3.3",
    "BuiltBy": "goreleaser",
    "BuildDate": "2024-08-01T19:48:25Z",
    "Commit": "4e82e7da10e922b28764da399edcd80f93f0f0b8",
    "Go": {
        "Os": "linux",
        "Arch": "amd64",
        "Version": "go1.21.12"
    }
}
$ docker run --rm -it tykio/tyk-gateway:v5.3 version --json

{
    "Version": "5.3.4",
    "BuiltBy": "goreleaser",
    "BuildDate": "2024-08-26T15:17:54Z",
    "Commit": "4e82e7da10e922b28764da399edcd80f93f0f0b8",
    "Go": {
        "Os": "linux",
        "Arch": "amd64",
        "Version": "go1.21.13"
    }
}
$ docker run --rm -it tykio/tyk-gateway:v5 version --json

{
    "Version": "5.3.4",
    "BuiltBy": "goreleaser",
    "BuildDate": "2024-08-26T15:17:54Z",
    "Commit": "4e82e7da10e922b28764da399edcd80f93f0f0b8",
    "Go": {
        "Os": "linux",
        "Arch": "amd64",
        "Version": "go1.21.13"
    }
}

I mentioned the build date difference between v5.3.3 and v5.3.4 in my report, and that both were built with that commit. It would seem that @ilijabojanovic accidentally tagged the v5.3.4 commit with the same commit from the v5.3.3 release.


That version (3.0.8) was the last release before the current releng started operating so we want to make sure that we can upgrade the package from that known version.

Thanks, that makes sense 👍


  • the final implementation of distroless was backported to v5.3 (LTS)

Perhaps, I did mention that in my report too, but noted that it was reverted without explanation. Perhaps that was accidental? Here:

docker run --rm -it  \
  -v /var/run/docker.sock:/var/run/docker.sock \
  wagoodman/dive:latest \
  tykio/tyk-gateway:v5.3

When you run that you'll get some insights into the image layers and contents:

image

image

image

As you can see from the DockerHub screenshots in my initial report, the size does reflect that these are not Distroless variants.


Despite the confusion caused by the v5 tag, compounded by outdated documentation, it does not appear that any of the published image contents are incorrect. Just pretend the v5 tag does not exist.

Then you and I are seeing very different things for some reason?

All I can do is provide the citations to sources that I have with screenshots to show what I am seeing, and the commands that I've shared in this response (and similar at the end of the initial report). These aren't specific to the v5 tag and conflict with your statements.

https://github.com/TykTechnologies/tyk/blob/v5.3.4/.github/workflows/release.yml#L139-L145

- name: push image to CI
if: ${{ matrix.golang_cross == '1.21-bullseye' }}
uses: docker/build-push-action@v5
with:
context: "dist"
platforms: linux/amd64,linux/arm64
file: ci/Dockerfile.std

https://github.com/TykTechnologies/tyk/blob/v5.3.4/.github/workflows/release.yml#L168-L174

- name: build multiarch image
if: ${{ matrix.golang_cross == '1.21-bullseye' }}
uses: docker/build-push-action@v5
with:
context: "dist"
platforms: linux/amd64,linux/arm64
file: ci/Dockerfile.std

I've linked above to the workflow for the direct tag v5.3.4 (commit 4e82e7da10e922b28764da399edcd80f93f0f0b8).

You can see that it's running the Dockerfile.std not Dockerfile.distroless like master branch. This type of issue could be avoided by managing the workflow like suggested earlier.

@alephnull
Copy link
Contributor

Ah, sorry. Now I see that that 5.3.3 and 5.3.4 are on the same commit. The point I'm trying to make is that the image corresponds to where the tag was pushed. Whether the tag is on the right place is not something I can comment on.

Your proposal for v5 is well thought out but this is a policy matter that will have to be decided by the those who are affected by the decision.

@polarathene
Copy link
Author

The point I'm trying to make is that the image corresponds to where the tag was pushed. Whether the tag is on the right place is not something I can comment on.

Just so we're on the same page:

  • A commit is tagged on the repo (either pushed or added via GH Release web UI), this triggers a CI build for that tag.
  • Workflows can be triggered by branches or tags, a tagged commit has no branch context IIRC (at least not as far as the workflow triggers are concerned branch and tag are separate, you don't get branch context).
  • That commit is checked out and any workflow present in that history will be used, otherwise fallback to the primary branch (master in this case) for the workflows. My advice here is to minimize the entrypoint workflows so that they all ref a specific branch to centralize workflow management and avoid the sync issues you've had.
  • Actual build of Tyk happens for that commit. Both the binary and the image metadata report the build commit used.
  • Image metadata is prepped like image tags and then pushed to registries.

The tag for v5.3.4 is clearly wrong, look at the information I provided.

  • The image build data for v5.3.3 along with the date that the commit was tagged in git indicate that was handled correctly.
  • v5.3.4 was mistakenly tagged to the same v5.3.3 commit, thus the build and the intended fixes were not actually released, it was instead a rebuild of v5.3.3 and pushed out as a new version since the commit in git was wrong.

Your proposal for v5 is well thought out but this is a policy matter that will have to be decided by the those who are affected by the decision.

I understand, I'm just a random user chiming in to let you know. I don't use Tyk but spotted this while evaluating if Tyk would be suitable to adopt.

Hopefully those involved in the decision can understand how bad the current approach is for automation to be these large single workflow files with re-use of flows that diverge not only in a single workflow, but now in the release branches and the bugs you're getting related to that when they're not synced properly.

What I've proposed has no drawbacks AFAIK, but if you're aware of any please let me know.

@alephnull
Copy link
Contributor

Appreciate the time and effort to go into such detail.

@polarathene
Copy link
Author

Apologies for all the noise, the bulk of what I reported was from:

  • An accident tagging the wrong commit, publishing a release without any actual fixes.
  • Outdated workflows from that commit triggered that have since been fixed on release-5.3.
  • Some minor concerns that aren't critical related to UX.

I had some misunderstandings during the report too from navigating the git history as I was not familiar with gromit at the time, but @alephnull has been helpful with his responses adding the context I needed 😎 (thanks!)


Based on discussion at TykTechnologies/gromit#357

  1. Since v5.3.4 was tagged with the commit for v5.3.3, it used the workflow from that same commit.
  2. The release-5.3 branch had been updated prior to tagging v5.3.4 which would have used the updates to publish with Distroless base image and any other improvements/fixes to the workflow since on the release-5.3 branch.
  3. If the workflows were modularized like this example to use branch refs for the real workflows managed by templates with gromit, the intended workflow would have been used instead of an outdated one.
  4. Not much could be done about the incorrect tagged commit, other than adding some heuristics like two semver tags for the same commit, or ensuring the tag matches some metadata elsewhere in the project repo (akin to .go-version](https://github.com/TykTechnologies/tyk/blob/master/.go-version), in Rust projects metadata from Cargo.toml could ensure the version matches the tag otherwise bail).
  5. Preventing the regression with v5 image tag updates is possible via another action check, but will instead the decided fix is to no longer be publish the tag (and potentially remove it from the registries).

Once v5.3.5 is released with the correct commit, it'll carry the v5.3.4 fixes that were never actually released 👍


There still might be some subtle regressions slipping through PRs:

- name: build multiarch image
if: ${{ matrix.golang_cross == '1.21-bullseye' }}
uses: docker/build-push-action@v5

Which from what I understand with the gromit templates sync, that'll be fixed implicitly. The concern is more about the flip/flop reverting that probably shouldn't be happening? Depending on timing of a workflow run it'll potentially be on stale changes (technically not timing of the run since you do not use branch refs for workflow selection, if the commit is older it'll be stuck with the stale workflow).

@polarathene
Copy link
Author

Actionables

Feel free to close this issue, below is a task summary of what has been covered:

  • v5.3.5 release will technically resolve the v5.3.4 release mishap.
  • v5 tag will either be updated correctly, or removed. If the existing tag is kept, like latest it should really be documented clearly that it's deprecated and should be avoided.
  • Docs should not reference latest tag. Clearly warn the user as unlike most published images latest is not maintained. Alternatively remove the CLI snippet for a direct image pull and update to align documentation examples with the referenced compose repo where the version tag is maintained.
  • Consider the suggestions for improving CI processes further that would avoid/minimize some subtle bugs that I've reported occurring in future from human error.

alephnull added a commit to TykTechnologies/gromit that referenced this issue Sep 19, 2024
alephnull added a commit to TykTechnologies/gromit that referenced this issue Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants