Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub #4614

Closed
wants to merge 3 commits into from

Conversation

simhadri-g
Copy link
Member

@simhadri-g simhadri-g commented Aug 17, 2023

Set up github actions workflow to build and push docker image to docker hub

Hi Everyone,

I have got the docker hub repository setup for Apache hive from Infra.
https://issues.apache.org/jira/browse/INFRA-24505

DockerHub: https://hub.docker.com/r/apache/hive

In this PR I have set up GitHub actions workflow to to automatically publish the docker image to Docker hub on every release.

Opening a new PR as older PR got auto closed : #4298

Updating the description after latest changes:
In the latest patch the Github action is divide to two parts:

Build from exiting binaries for old releases. (BuildFromArchive - manually triggered)
Build from source on tag creation for new release. (BuildFromSource - auto-triggered on new tag creation )

This github action publishes docker images for both:

  1. linux/amd64
  2. linux/arm64

Update 2:
We don't have to worry about the github limits anymore. Now the images are built and published in a single job. Any artefact created and stored with in a job does not count towards github storage limit.

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

Set up github actions workflow to build and push docker image to docker hub
@zabetak
Copy link
Contributor

zabetak commented Aug 21, 2023

Whenever we have a new release we create a new tag under rel e.g., https://github.com/apache/hive/tree/rel/release-4.0.0-beta-1.

Can't we simply launch the workflow on rel/tag creation, build the project, and publish from there?

@simhadri-g simhadri-g changed the title HIVE-27277: GH actions to build and push docker image HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub Aug 21, 2023
@simhadri-g
Copy link
Member Author

simhadri-g commented Aug 21, 2023

Whenever we have a new release we create a new tag under rel e.g., https://github.com/apache/hive/tree/rel/release-4.0.0-beta-1.

Can't we simply launch the workflow on rel/tag creation, build the project, and publish from there?

We can. But there is one constraint with Github actions. One hive tar.gz is about ~450 MB and for linux/amd64 and arm64 we will need a total of almost 1 GB to store the build artifacts.

But github free storage for artifacts is only 0.5 GB (as far as I am aware). Once we exceed the threshold it will affect other Github actions as well and actions will fail.
https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#included-storage-and-minutes

I am not sure if Apache GitHub repos have additional limits.

In order to not hit these limits, we decided to download the tars from the Apache archive.

Copy link
Member Author

@simhadri-g simhadri-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated PR to build from source for new release as well as support building from binaries for older release.

packaging/src/docker/Dockerfile Show resolved Hide resolved
.github/workflows/docker-GA-images.yml Show resolved Hide resolved
.github/workflows/docker-GA-images.yml Show resolved Hide resolved
packaging/src/docker/Dockerfile Show resolved Hide resolved
@simhadri-g
Copy link
Member Author

@sonarcloud
Copy link

sonarcloud bot commented Aug 23, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here


-
name: 'Set up JDK 8'
uses: actions/setup-java@v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AMD64/ ARM64 it need different Java version. Is this magic done by this build?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docker mentions it supports multi aarch builds, that is what we are using here:
https://docs.docker.com/build/ci/github-actions/multi-platform/

Let me confirm in the final images for arm64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use openjdk/8-jre-slim which has both arm64 and amd64 images .
I think these will be pulled according to the aarch. Let me confirm.
https://hub.docker.com/layers/library/openjdk/8-jre-slim/images/sha256-885d7cea2430cd637b3592118e1d52abdad90300e2e491e7b457319edd39123d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on MAC OS with M1 chip, but it does not work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi ,

I just verified on M2 mac:

>>>docker pull simhadri064/hive:4.0.0-beta-1-snapshot

>>>docker images                                                 
REPOSITORY         TAG                     IMAGE ID       CREATED        SIZE
simhadri064/hive   4.0.0-beta-1-snapshot   eaf428cf8e2c   36 hours ago   1.43GB`

>>>export HIVE_VERSION=4.0.0-beta-1-snapshot
>>>docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 simhadri064/hive:${HIVE_VERSION}
8bafa42f8724b09cb9487a218c9619cd27811033e376d611f5e13134c92a81e0
>>>

Running queries via beeline

beeline> !connect jdbc:hive2://localhost:10000/;`

0: jdbc:hive2://localhost:10000/> create table hive_example(a string, b int) partitioned by(c int);
INFO  : Compiling command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1): create table hive_example(a string, b int) partitioned by(c int)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1); Time taken: 0.023 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1): create table hive_example(a string, b int) partitioned by(c int)
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1); Time taken: 0.214 seconds
No rows affected (0.251 seconds)
0: jdbc:hive2://localhost:10000/> insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3);
INFO  : Compiling command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313): insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313); Time taken: 1.033 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313): insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3)
INFO  : Query ID = hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: insert into hive_exam...... ('a', 2),('b',3) (Stage-1)
INFO  : HS2 Host: [8bafa42f8724], Query ID: [hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313], Dag ID: [dag_1692872917513_0001_1], DAG Session ID: [application_1692872917513_0001]
INFO  : Status: Running (Executing on YARN cluster with App id application_1692872917513_0001)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 0.91 s
----------------------------------------------------------------------------------------------
INFO  : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table default.hive_example partition (c=1) from file:/opt/hive/data/warehouse/hive_example/c=1/.hive-staging_hive_2023-08-24_10-28-36_073_5938108838462584033-1/-ext-10000
INFO  : Starting task [Stage-3:STATS] in serial mode
INFO  : Executing stats task
INFO  : Partition {c=1} stats: [numFiles=1, numRows=3, totalSize=12, rawDataSize=9, numFilesErasureCoded=0]
INFO  : Completed executing command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313); Time taken: 2.038 seconds
3 rows affected (3.118 seconds)
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/> select count(distinct a) from hive_example;
INFO  : Compiling command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56): select count(distinct a) from hive_example
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56); Time taken: 0.2 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56): select count(distinct a) from hive_example
INFO  : Query ID = hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56
INFO  : Session is already open
INFO  : Dag name: select count(distinct a) from hive_example (Stage-1)
INFO  : HS2 Host: [8bafa42f8724], Query ID: [hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56], Dag ID: [dag_1692872917513_0001_2], DAG Session ID: [application_1692872917513_0001]
INFO  : Status: Running (Executing on YARN cluster with App id application_1692872917513_0001)

INFO  : Completed executing command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56); Time taken: 0.587 seconds
+------+
| _c0  |
+------+
| 2    |
+------+
1 row selected (0.832 seconds)
0: jdbc:hive2://localhost:10000/>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, and works now. Thanks!

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making this perfect @simhadri-g !

If we want to test it, and create a fake tag after this change gets in, is it possible to remove that image afterwards from dockerhub?

@simhadri-g
Copy link
Member Author

simhadri-g commented Aug 24, 2023

Thanks for the review @zabetak , @dengzhhu653 @nrg4878 @aturoczy @zratkai ! :)

If we want to test it, and create a fake tag after this change gets in, is it possible to remove that image afterwards from dockerhub?

Yes, i think it should be possible .
I think currently only @ayushtkn has access to hive dockerhub account to delete images.

@zabetak zabetak closed this in b7e3e1d Aug 25, 2023
scarlin-cloudera pushed a commit to scarlin-cloudera/hive that referenced this pull request Aug 29, 2023
…dappa reviewed by Stamatis Zampetakis, Zhihua Deng, Naveen Gangam, Attila Turoczy, Zoltan Ratkai)

1. Add (manually triggered) action for building images from archives
(for old releases)
2. Add action for building images from sources automatically triggered
on new tag creation.
3. Publish docker images for amd64 and arm64 platforms to dockerhub

Closes apache#4614
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
…dappa reviewed by Stamatis Zampetakis, Zhihua Deng, Naveen Gangam, Attila Turoczy, Zoltan Ratkai)

1. Add (manually triggered) action for building images from archives
(for old releases)
2. Add action for building images from sources automatically triggered
on new tag creation.
3. Publish docker images for amd64 and arm64 platforms to dockerhub

Closes apache#4614
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants