Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27277: GH actions to build and push docker image #4274

Closed
wants to merge 3 commits into from

Conversation

simhadri-g
Copy link
Member

Hi Everyone,

I have got the docker hub repository setup for Apache hive from Infra.
https://issues.apache.org/jira/browse/INFRA-24505

DockerHub: https://hub.docker.com/r/apache/hive

In order to publish the docker image to Docker hub, in this PR I have set up GitHub actions workflow to build and push docker image to Docker hub. The workflow was tested on a hive fork and the image was successfully pushed here. https://hub.docker.com/repository/docker/simhadri064/hive/tags?page=1&ordering=last_updated

We will need to decide on the frequency at which we push these images to docker hub.

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Set up the same workflow on why fork and pushed to personal dockerhub account via github actions

name: ci hive docker image

on:
push:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on the frequency:

  1. We would better to trigger the action for every new release.
  2. For master branch, I think we can update the image every three months.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a -latest with the GA version.
Also we could have a daily release about the -dev version (or tags)
for every commit would be a bit overused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have 2 workflows:

  1. GA workflow - Frequency : Once per release
  2. For the latest dev images. - Frequency: Once per week? Because on an average hive gets about 10 to 15 commits per week.(https://github.com/apache/hive/graphs/commit-activity)

This PR set up a workflow to build and publish docker images for the GA versions of hive.
I will raise a follow-up jira to address the workflow needed for daily/dev images.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once per week makes sense to me for dev, if there are some limitations on the number of the dev images, for example, only keep the latest 10 images for dev.

context: ./packaging/src/docker/
file: ./packaging/src/docker/Dockerfile
push: true
tags: ${{ secrets.DOCKERHUB_USERNAME }}/hive:test-image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better add the real version to the image instead of test-image, and I'm thinking it would be great if we can determine the HADOOP_VERSION, TEZ_VERSION from the project.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about -hive:dev or -hive:daily?
The GA version should be the same as the industry follows like hive4.0-latest imho

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GA: the versions that are set in the .yml file were manually configured after looking at the hive/pom.xml file.

For hive:daily, i think we can obtain them from the pom.xml file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we trigger the build for GA automatically?
https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#release
I think it makes some troubles every time we should add the new GA build manually, it adds extra steps for releasing the new version, sometimes we may even forget about it.
For the old released version, I think we can push the image manually.

Copy link
Member Author

@simhadri-g simhadri-g Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we trigger the build for GA automatically?
https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#release

Since we are pushing the changes to the dockerhub for the 1st time, we wanted to trigger it with workflow_dispatcher to verify the dockerhub integration.

Once we verify that this GH action succeeds, we can set it to trigger automatically and update all the images on every release or once every three months.

I think it makes some troubles every time we should add the new GA build manually, it adds extra steps for
releasing the new version, sometimes we may even forget about it.

I agree, but I think for a new GA :

  • We will not have prior knowledge of the versions of hive, tez and hadoop to use in the next GA. (Workaround could be: obtain from pom.xml)
  • Someone will have to build the new GA docker images locally and verify if it's working before we push them to docker hub.

So that is why i was thinking we should retain the manual step at release time.
Other repos follow something similar: https://github.com/apache/spark-docker/tree/master/.github/workflows

Set up github actions workflow to build and push docker image to docker hub
@sonarcloud
Copy link

sonarcloud bot commented Apr 28, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 16 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@ayushtkn
Copy link
Member

@zabetak / @abstractdog / @dengzhhu653 any comments/thoughts on this. @simhadri-g has a planned follow up as well on this, he can share the details

@simhadri-g
Copy link
Member Author

simhadri-g commented Apr 28, 2023

Since we are pushing the changes to the docker hub for the 1st time, we wanted to trigger it with workflow_dispatcher to verify the docker hub integration and publish the GA images.

Once we verify that this GH action succeeds, I would like to parameterize this and we can set it to trigger automatically every release when a new rel/**- < version > branch gets created.

@simhadri-g simhadri-g marked this pull request as draft May 4, 2023 18:22
@simhadri-g
Copy link
Member Author

Will reopen the PR after testing with new changes completes. This is to prevent unnecessary runs of hive precommit tests .

@simhadri-g simhadri-g closed this May 4, 2023
@simhadri-g
Copy link
Member Author

New PR:
#4298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants