HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub #4614

simhadri-g · 2023-08-17T12:36:12Z

Set up github actions workflow to build and push docker image to docker hub

Hi Everyone,

I have got the docker hub repository setup for Apache hive from Infra.
https://issues.apache.org/jira/browse/INFRA-24505

DockerHub: https://hub.docker.com/r/apache/hive

In this PR I have set up GitHub actions workflow to to automatically publish the docker image to Docker hub on every release.

Opening a new PR as older PR got auto closed : #4298

Updating the description after latest changes:
In the latest patch the Github action is divide to two parts:

Build from exiting binaries for old releases. (BuildFromArchive - manually triggered)
Build from source on tag creation for new release. (BuildFromSource - auto-triggered on new tag creation )

This github action publishes docker images for both:

linux/amd64
linux/arm64

Update 2:
We don't have to worry about the github limits anymore. Now the images are built and published in a single job. Any artefact created and stored with in a job does not count towards github storage limit.

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

Set up github actions workflow to build and push docker image to docker hub

.github/workflows/docker-GA-images.yml

packaging/src/docker/Dockerfile

zabetak · 2023-08-21T11:45:14Z

Whenever we have a new release we create a new tag under rel e.g., https://github.com/apache/hive/tree/rel/release-4.0.0-beta-1.

Can't we simply launch the workflow on rel/tag creation, build the project, and publish from there?

simhadri-g · 2023-08-21T14:15:48Z

Whenever we have a new release we create a new tag under rel e.g., https://github.com/apache/hive/tree/rel/release-4.0.0-beta-1.

Can't we simply launch the workflow on rel/tag creation, build the project, and publish from there?

We can. But there is one constraint with Github actions. One hive tar.gz is about ~450 MB and for linux/amd64 and arm64 we will need a total of almost 1 GB to store the build artifacts.

But github free storage for artifacts is only 0.5 GB (as far as I am aware). Once we exceed the threshold it will affect other Github actions as well and actions will fail.
https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#included-storage-and-minutes

I am not sure if Apache GitHub repos have additional limits.

In order to not hit these limits, we decided to download the tars from the Apache archive.

simhadri-g

updated PR to build from source for new release as well as support building from binaries for older release.

packaging/src/docker/Dockerfile

.github/workflows/docker-GA-images.yml

packaging/src/docker/Dockerfile

simhadri-g · 2023-08-23T07:04:29Z

Tested the build from source workflow on my personal account here : https://github.com/simhadri-g/hive/actions/runs/5944353617/job/16121443889
Images pushed to Docker hub: https://hub.docker.com/layers/simhadri064/hive/4.0.0-beta-1-snapshot/images/sha256-4a883c3b2e616e101cc6d8ce313078d89f608cff8a5e7862f4f3abab30c38fda?context=explore

sonarcloud · 2023-08-23T08:14:11Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

zratkai · 2023-08-22T08:31:20Z

.github/workflows/docker-GA-images.yml

+
+      -
+        name: 'Set up JDK 8'
+        uses: actions/setup-java@v1


For AMD64/ ARM64 it need different Java version. Is this magic done by this build?

Docker mentions it supports multi aarch builds, that is what we are using here:
https://docs.docker.com/build/ci/github-actions/multi-platform/

Let me confirm in the final images for arm64.

We use openjdk/8-jre-slim which has both arm64 and amd64 images .
I think these will be pulled according to the aarch. Let me confirm.
https://hub.docker.com/layers/library/openjdk/8-jre-slim/images/sha256-885d7cea2430cd637b3592118e1d52abdad90300e2e491e7b457319edd39123d

I tested it on MAC OS with M1 chip, but it does not work.

Hi ,

I just verified on M2 mac:

>>>docker pull simhadri064/hive:4.0.0-beta-1-snapshot >>>docker images REPOSITORY TAG IMAGE ID CREATED SIZE simhadri064/hive 4.0.0-beta-1-snapshot eaf428cf8e2c 36 hours ago 1.43GB` >>>export HIVE_VERSION=4.0.0-beta-1-snapshot >>>docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 simhadri064/hive:${HIVE_VERSION} 8bafa42f8724b09cb9487a218c9619cd27811033e376d611f5e13134c92a81e0 >>>

Running queries via beeline

beeline> !connect jdbc:hive2://localhost:10000/;` 0: jdbc:hive2://localhost:10000/> create table hive_example(a string, b int) partitioned by(c int); INFO : Compiling command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1): create table hive_example(a string, b int) partitioned by(c int) INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1); Time taken: 0.023 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1): create table hive_example(a string, b int) partitioned by(c int) INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1); Time taken: 0.214 seconds No rows affected (0.251 seconds) 0: jdbc:hive2://localhost:10000/> insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3); INFO : Compiling command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313): insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3) INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:int, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313); Time taken: 1.033 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313): insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3) INFO : Query ID = hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: insert into hive_exam...... ('a', 2),('b',3) (Stage-1) INFO : HS2 Host: [8bafa42f8724], Query ID: [hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313], Dag ID: [dag_1692872917513_0001_1], DAG Session ID: [application_1692872917513_0001] INFO : Status: Running (Executing on YARN cluster with App id application_1692872917513_0001) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 0.91 s ---------------------------------------------------------------------------------------------- INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table default.hive_example partition (c=1) from file:/opt/hive/data/warehouse/hive_example/c=1/.hive-staging_hive_2023-08-24_10-28-36_073_5938108838462584033-1/-ext-10000 INFO : Starting task [Stage-3:STATS] in serial mode INFO : Executing stats task INFO : Partition {c=1} stats: [numFiles=1, numRows=3, totalSize=12, rawDataSize=9, numFilesErasureCoded=0] INFO : Completed executing command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313); Time taken: 2.038 seconds 3 rows affected (3.118 seconds) 0: jdbc:hive2://localhost:10000/> 0: jdbc:hive2://localhost:10000/> 0: jdbc:hive2://localhost:10000/> select count(distinct a) from hive_example; INFO : Compiling command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56): select count(distinct a) from hive_example INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56); Time taken: 0.2 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56): select count(distinct a) from hive_example INFO : Query ID = hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56 INFO : Session is already open INFO : Dag name: select count(distinct a) from hive_example (Stage-1) INFO : HS2 Host: [8bafa42f8724], Query ID: [hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56], Dag ID: [dag_1692872917513_0001_2], DAG Session ID: [application_1692872917513_0001] INFO : Status: Running (Executing on YARN cluster with App id application_1692872917513_0001) INFO : Completed executing command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56); Time taken: 0.587 seconds +------+ | _c0 | +------+ | 2 | +------+ 1 row selected (0.832 seconds) 0: jdbc:hive2://localhost:10000/>

Tested, and works now. Thanks!

zabetak

LGTM, thanks for making this perfect @simhadri-g !

If we want to test it, and create a fake tag after this change gets in, is it possible to remove that image afterwards from dockerhub?

simhadri-g · 2023-08-24T15:39:03Z

Thanks for the review @zabetak , @dengzhhu653 @nrg4878 @aturoczy @zratkai ! :)

If we want to test it, and create a fake tag after this change gets in, is it possible to remove that image afterwards from dockerhub?

Yes, i think it should be possible .
I think currently only @ayushtkn has access to hive dockerhub account to delete images.

…dappa reviewed by Stamatis Zampetakis, Zhihua Deng, Naveen Gangam, Attila Turoczy, Zoltan Ratkai) 1. Add (manually triggered) action for building images from archives (for old releases) 2. Add action for building images from sources automatically triggered on new tag creation. 3. Publish docker images for amd64 and arm64 platforms to dockerhub Closes apache#4614

asf-ci-hive added the tests pending label Aug 17, 2023

simhadri-g force-pushed the master branch from 6457877 to b29d2e3 Compare August 17, 2023 12:37

asf-ci-hive added tests failed and removed tests pending labels Aug 17, 2023

HIVE-27277: GH actions to build and push docker image

43511e3

Set up github actions workflow to build and push docker image to docker hub

simhadri-g force-pushed the master branch from b29d2e3 to 43511e3 Compare August 17, 2023 13:17

asf-ci-hive added tests pending tests failed tests passed and removed tests failed tests pending labels Aug 17, 2023

dengzhhu653 reviewed Aug 20, 2023

View reviewed changes

.github/workflows/docker-GA-images.yml Show resolved Hide resolved

dengzhhu653 reviewed Aug 20, 2023

View reviewed changes

packaging/src/docker/Dockerfile Show resolved Hide resolved

Add support for multiple platforms

6bf207e

asf-ci-hive added tests pending and removed tests passed labels Aug 21, 2023

simhadri-g changed the title ~~HIVE-27277: GH actions to build and push docker image~~ HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub Aug 21, 2023

asf-ci-hive added tests unstable tests pending and removed tests pending tests unstable labels Aug 21, 2023

simhadri-g force-pushed the master branch 2 times, most recently from b07c246 to 298508d Compare August 22, 2023 19:56

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Aug 22, 2023

Build on tag creation

1954c1c

simhadri-g force-pushed the master branch from 298508d to 1954c1c Compare August 22, 2023 21:44

asf-ci-hive added tests failed tests pending tests unstable and removed tests pending tests failed labels Aug 22, 2023

simhadri-g commented Aug 23, 2023

View reviewed changes

packaging/src/docker/Dockerfile Show resolved Hide resolved

.github/workflows/docker-GA-images.yml Show resolved Hide resolved

.github/workflows/docker-GA-images.yml Show resolved Hide resolved

packaging/src/docker/Dockerfile Show resolved Hide resolved

asf-ci-hive added tests pending and removed tests unstable labels Aug 23, 2023

asf-ci-hive added tests passed and removed tests pending labels Aug 23, 2023

zratkai reviewed Aug 23, 2023

View reviewed changes

simhadri-g requested a review from dengzhhu653 August 24, 2023 10:52

zratkai approved these changes Aug 24, 2023

View reviewed changes

zabetak approved these changes Aug 24, 2023

View reviewed changes

zabetak closed this in b7e3e1d Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub #4614

HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub #4614

simhadri-g commented Aug 17, 2023 •

edited

Loading

zabetak commented Aug 21, 2023

simhadri-g commented Aug 21, 2023 •

edited

Loading

simhadri-g left a comment

simhadri-g commented Aug 23, 2023

sonarcloud bot commented Aug 23, 2023

zratkai Aug 22, 2023

simhadri-g Aug 23, 2023

simhadri-g Aug 23, 2023

zratkai Aug 23, 2023

simhadri-g Aug 24, 2023

zratkai Aug 24, 2023

zabetak left a comment

simhadri-g commented Aug 24, 2023 •

edited

Loading

HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub #4614

HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub #4614

Conversation

simhadri-g commented Aug 17, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

zabetak commented Aug 21, 2023

simhadri-g commented Aug 21, 2023 • edited Loading

simhadri-g left a comment

Choose a reason for hiding this comment

simhadri-g commented Aug 23, 2023

sonarcloud bot commented Aug 23, 2023

zratkai Aug 22, 2023

Choose a reason for hiding this comment

simhadri-g Aug 23, 2023

Choose a reason for hiding this comment

simhadri-g Aug 23, 2023

Choose a reason for hiding this comment

zratkai Aug 23, 2023

Choose a reason for hiding this comment

simhadri-g Aug 24, 2023

Choose a reason for hiding this comment

zratkai Aug 24, 2023

Choose a reason for hiding this comment

zabetak left a comment

Choose a reason for hiding this comment

simhadri-g commented Aug 24, 2023 • edited Loading

simhadri-g commented Aug 17, 2023 •

edited

Loading

simhadri-g commented Aug 21, 2023 •

edited

Loading

simhadri-g commented Aug 24, 2023 •

edited

Loading