-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-27277: GH actions to build linux/amd64 and arm64 images and push docker hub #4614
Conversation
Set up github actions workflow to build and push docker image to docker hub
Whenever we have a new release we create a new tag under rel e.g., https://github.com/apache/hive/tree/rel/release-4.0.0-beta-1. Can't we simply launch the workflow on rel/tag creation, build the project, and publish from there? |
We can. But there is one constraint with Github actions. One hive tar.gz is about ~450 MB and for linux/amd64 and arm64 we will need a total of almost 1 GB to store the build artifacts. But github free storage for artifacts is only 0.5 GB (as far as I am aware). Once we exceed the threshold it will affect other Github actions as well and actions will fail. I am not sure if Apache GitHub repos have additional limits. In order to not hit these limits, we decided to download the tars from the Apache archive. |
b07c246
to
298508d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated PR to build from source for new release as well as support building from binaries for older release.
Tested the build from source workflow on my personal account here : https://github.com/simhadri-g/hive/actions/runs/5944353617/job/16121443889 |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. |
|
||
- | ||
name: 'Set up JDK 8' | ||
uses: actions/setup-java@v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For AMD64/ ARM64 it need different Java version. Is this magic done by this build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker mentions it supports multi aarch builds, that is what we are using here:
https://docs.docker.com/build/ci/github-actions/multi-platform/
Let me confirm in the final images for arm64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use openjdk/8-jre-slim which has both arm64 and amd64 images .
I think these will be pulled according to the aarch. Let me confirm.
https://hub.docker.com/layers/library/openjdk/8-jre-slim/images/sha256-885d7cea2430cd637b3592118e1d52abdad90300e2e491e7b457319edd39123d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it on MAC OS with M1 chip, but it does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi ,
I just verified on M2 mac:
>>>docker pull simhadri064/hive:4.0.0-beta-1-snapshot
>>>docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
simhadri064/hive 4.0.0-beta-1-snapshot eaf428cf8e2c 36 hours ago 1.43GB`
>>>export HIVE_VERSION=4.0.0-beta-1-snapshot
>>>docker run -d -p 10000:10000 -p 10002:10002 --env SERVICE_NAME=hiveserver2 --name hive4 simhadri064/hive:${HIVE_VERSION}
8bafa42f8724b09cb9487a218c9619cd27811033e376d611f5e13134c92a81e0
>>>
Running queries via beeline
beeline> !connect jdbc:hive2://localhost:10000/;`
0: jdbc:hive2://localhost:10000/> create table hive_example(a string, b int) partitioned by(c int);
INFO : Compiling command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1): create table hive_example(a string, b int) partitioned by(c int)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1); Time taken: 0.023 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1): create table hive_example(a string, b int) partitioned by(c int)
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20230824102827_11655e00-6df2-4f5e-8852-87730b221bb1); Time taken: 0.214 seconds
No rows affected (0.251 seconds)
0: jdbc:hive2://localhost:10000/> insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3);
INFO : Compiling command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313): insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3)
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313); Time taken: 1.033 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313): insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3)
INFO : Query ID = hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into hive_exam...... ('a', 2),('b',3) (Stage-1)
INFO : HS2 Host: [8bafa42f8724], Query ID: [hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313], Dag ID: [dag_1692872917513_0001_1], DAG Session ID: [application_1692872917513_0001]
INFO : Status: Running (Executing on YARN cluster with App id application_1692872917513_0001)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 0.91 s
----------------------------------------------------------------------------------------------
INFO : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table default.hive_example partition (c=1) from file:/opt/hive/data/warehouse/hive_example/c=1/.hive-staging_hive_2023-08-24_10-28-36_073_5938108838462584033-1/-ext-10000
INFO : Starting task [Stage-3:STATS] in serial mode
INFO : Executing stats task
INFO : Partition {c=1} stats: [numFiles=1, numRows=3, totalSize=12, rawDataSize=9, numFilesErasureCoded=0]
INFO : Completed executing command(queryId=hive_20230824102836_75b77203-9b31-4b35-9627-fb9b6df20313); Time taken: 2.038 seconds
3 rows affected (3.118 seconds)
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/>
0: jdbc:hive2://localhost:10000/> select count(distinct a) from hive_example;
INFO : Compiling command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56): select count(distinct a) from hive_example
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56); Time taken: 0.2 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56): select count(distinct a) from hive_example
INFO : Query ID = hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56
INFO : Session is already open
INFO : Dag name: select count(distinct a) from hive_example (Stage-1)
INFO : HS2 Host: [8bafa42f8724], Query ID: [hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56], Dag ID: [dag_1692872917513_0001_2], DAG Session ID: [application_1692872917513_0001]
INFO : Status: Running (Executing on YARN cluster with App id application_1692872917513_0001)
INFO : Completed executing command(queryId=hive_20230824102847_73f39ec5-11fd-4379-a206-a528f884ed56); Time taken: 0.587 seconds
+------+
| _c0 |
+------+
| 2 |
+------+
1 row selected (0.832 seconds)
0: jdbc:hive2://localhost:10000/>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, and works now. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making this perfect @simhadri-g !
If we want to test it, and create a fake tag after this change gets in, is it possible to remove that image afterwards from dockerhub?
Thanks for the review @zabetak , @dengzhhu653 @nrg4878 @aturoczy @zratkai ! :)
Yes, i think it should be possible . |
…dappa reviewed by Stamatis Zampetakis, Zhihua Deng, Naveen Gangam, Attila Turoczy, Zoltan Ratkai) 1. Add (manually triggered) action for building images from archives (for old releases) 2. Add action for building images from sources automatically triggered on new tag creation. 3. Publish docker images for amd64 and arm64 platforms to dockerhub Closes apache#4614
…dappa reviewed by Stamatis Zampetakis, Zhihua Deng, Naveen Gangam, Attila Turoczy, Zoltan Ratkai) 1. Add (manually triggered) action for building images from archives (for old releases) 2. Add action for building images from sources automatically triggered on new tag creation. 3. Publish docker images for amd64 and arm64 platforms to dockerhub Closes apache#4614
Set up github actions workflow to build and push docker image to docker hub
Hi Everyone,
I have got the docker hub repository setup for Apache hive from Infra.
https://issues.apache.org/jira/browse/INFRA-24505
DockerHub: https://hub.docker.com/r/apache/hive
In this PR I have set up GitHub actions workflow to to automatically publish the docker image to Docker hub on every release.
Opening a new PR as older PR got auto closed : #4298
Updating the description after latest changes:
In the latest patch the Github action is divide to two parts:
Build from exiting binaries for old releases. (BuildFromArchive - manually triggered)
Build from source on tag creation for new release. (BuildFromSource - auto-triggered on new tag creation )
This github action publishes docker images for both:
Update 2:
We don't have to worry about the github limits anymore. Now the images are built and published in a single job. Any artefact created and stored with in a job does not count towards github storage limit.
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
Is the change a dependency upgrade?
How was this patch tested?