Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: detail how to mount and use external drivers #449

Merged
merged 19 commits into from
Apr 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.

## [Unreleased]

### Added

- Added documentation/tutorial on using external database drivers ([#449]).

### Changed

- BREAKING: Switch to new image that only contains HMS.
Expand All @@ -12,6 +16,7 @@ All notable changes to this project will be documented in this file.
`metastore-log4j2.properties` ([#447]).

[#447]: https://github.com/stackabletech/hive-operator/pull/447
[#449]: https://github.com/stackabletech/hive-operator/pull/449

## [24.3.0] - 2024-03-20

Expand Down
3 changes: 3 additions & 0 deletions docs/modules/hive/pages/required-external-components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,6 @@ The Hive Metastore requires a backend SQL database. Supported databases and vers
* MS SQL Server 2008 R2 and above

Reference: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration#AdminManualMetastoreAdministration-SupportedBackendDatabasesforMetastore[Hive Metastore documentation]

The Stackable product images for Apache Hive come with built-in support for PostgreSQL.
See xref:usage-guide/database-driver.adoc[] for details on how to make drivers for other databases (supported by Hive) available.
229 changes: 229 additions & 0 deletions docs/modules/hive/pages/usage-guide/database-driver.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
= Database drivers

The Stackable product images for Apache Hive come with built-in support for using PostgreSQL as the metastore database.
The MySQL driver is not shipped in our images due to licensing issues.
To use another supported database it is necessary to make the relevant drivers available to Hive: this tutorial shows how this is done for MySQL.

== Install the MySQL helm chart

[source,bash]
----
helm install mysql oci://registry-1.docker.io/bitnamicharts/mysql \
--set auth.database=hive \

Check notice on line 12 in docs/modules/hive/pages/usage-guide/database-driver.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/hive/pages/usage-guide/database-driver.adoc#L12

If a new sentence starts here, add a space and start with an uppercase letter. (LC_AFTER_PERIOD[1]) Suggestions: ` Database`, ` database` Rule: https://community.languagetool.org/rule/show/LC_AFTER_PERIOD?lang=en-US&subId=1 Category: CASING
Raw output
docs/modules/hive/pages/usage-guide/database-driver.adoc:12:11: If a new sentence starts here, add a space and start with an uppercase letter. (LC_AFTER_PERIOD[1])
 Suggestions: ` Database`, ` database`
 Rule: https://community.languagetool.org/rule/show/LC_AFTER_PERIOD?lang=en-US&subId=1
 Category: CASING
--set auth.username=hive \

Check notice on line 13 in docs/modules/hive/pages/usage-guide/database-driver.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/hive/pages/usage-guide/database-driver.adoc#L13

If a new sentence starts here, add a space and start with an uppercase letter. (LC_AFTER_PERIOD[1]) Suggestions: ` Username`, ` username` Rule: https://community.languagetool.org/rule/show/LC_AFTER_PERIOD?lang=en-US&subId=1 Category: CASING
Raw output
docs/modules/hive/pages/usage-guide/database-driver.adoc:13:11: If a new sentence starts here, add a space and start with an uppercase letter. (LC_AFTER_PERIOD[1])
 Suggestions: ` Username`, ` username`
 Rule: https://community.languagetool.org/rule/show/LC_AFTER_PERIOD?lang=en-US&subId=1
 Category: CASING
--set auth.password=hive

Check notice on line 14 in docs/modules/hive/pages/usage-guide/database-driver.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/hive/pages/usage-guide/database-driver.adoc#L14

If a new sentence starts here, add a space and start with an uppercase letter. (LC_AFTER_PERIOD[1]) Suggestions: ` Password`, ` password` Rule: https://community.languagetool.org/rule/show/LC_AFTER_PERIOD?lang=en-US&subId=1 Category: CASING
Raw output
docs/modules/hive/pages/usage-guide/database-driver.adoc:14:11: If a new sentence starts here, add a space and start with an uppercase letter. (LC_AFTER_PERIOD[1])
 Suggestions: ` Password`, ` password`
 Rule: https://community.languagetool.org/rule/show/LC_AFTER_PERIOD?lang=en-US&subId=1
 Category: CASING
----

== Download the driver to a PersistentVolumeClaim

.Create a PersistentVolumeClaim
[source,yaml]
----
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-hive-drivers
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
----

Download the driver from e.g. https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.31/[maven] to a volume backed by the PVC:

.Download the driver
[source,yaml]
----
---
apiVersion: batch/v1
kind: Job
metadata:
name: pvc-hive-job
spec:
template:
spec:
restartPolicy: Never
volumes:
- name: external-drivers
persistentVolumeClaim:
claimName: pvc-hive-drivers
initContainers:
- name: dest-dir
image: docker.stackable.tech/stackable/tools:1.0.0-stackable24.3.0
env:
- name: DEST_DIR
value: "/stackable/externals"
command:
[
"bash",
"-x",
"-c",
"mkdir -p ${DEST_DIR} && chown stackable:stackable ${DEST_DIR} && chmod -R a=,u=rwX,g=rwX ${DEST_DIR}",
]
securityContext:
runAsUser: 0
volumeMounts:
- name: external-drivers
mountPath: /stackable/externals
containers:
- name: hive-driver
image: docker.stackable.tech/stackable/tools:1.0.0-stackable24.3.0
env:
- name: DEST_DIR
value: "/stackable/externals"
command:
[
"bash",
"-x",
"-c",
"curl -L https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.31/mysql-connector-j-8.0.31.jar \
-o ${DEST_DIR}/mysql-connector-j-8.0.31.jar",
]
volumeMounts:
- name: external-drivers
mountPath: /stackable/externals
----

This will make the driver available at `/stackable/external-drivers/mysql-connector-j-8.0.31.jar` when the volume `external-drivers` is mounted at `/stackable/external-drivers`.

Once the above has completed successfully, you can confirm that the driver is in the expected location by running another job:

[source,yaml]
----
---
apiVersion: batch/v1
kind: Job
metadata:
name: list-drivers-job
spec:
template:
spec:
restartPolicy: Never
volumes:
- name: external-drivers
persistentVolumeClaim:
claimName: pvc-hive-drivers
containers:
- name: hive-driver
image: docker.stackable.tech/stackable/tools:1.0.0-stackable24.3.0
env:
- name: DEST_DIR
value: "/stackable/externals"
command:
[
"bash",
"-x",
"-o",
"pipefail",
"-c",
"stat ${DEST_DIR}/mysql-connector-j-8.0.31.jar",
]
volumeMounts:
- name: external-drivers
mountPath: /stackable/externals
----

== Create a Hive cluster

The MySQL connection details can then be used in the definition of the Hive Metastore resource.
Note that it is also necessary to "tell" Hive where to find the driver.
This is done by setting the value of the environment variable `METASTORE_AUX_JARS_PATH` to the path of the mounted driver:

[source,yaml]
----
---
apiVersion: hive.stackable.tech/v1alpha1
kind: HiveCluster
metadata:
name: hive-with-drivers
spec:
image:
productVersion: 3.1.3
clusterConfig:
database:
connString: jdbc:mysql://mysql:3306/hive # <1>
user: hive # <2>
password: hive
dbType: mysql
s3:
reference: minio # <3>
metastore:
roleGroups:
default:
envOverrides:
METASTORE_AUX_JARS_PATH: "/stackable/external-drivers/mysql-connector-j-8.0.31.jar" # <4>
podOverrides: # <5>
spec:
containers:
- name: hive
volumeMounts:
- name: external-drivers
mountPath: /stackable/external-drivers
volumes:
- name: external-drivers
persistentVolumeClaim:
claimName: pvc-hive-drivers
replicas: 1
----

<1> The database connection details matching those given when deploying the MySQL Helm chart
<2> Plain-text Hive credentials will be replaced in an upcoming release!
<3> A reference to the file store using S3 (this has been omitted from this article for the sake of brevity, but is described in e.g. the xref:getting_started/first_steps.adoc[] guide)
<4> Use `envOverrides` to set the driver path
<5> Use `podOverrides` to mount the driver

NOTE: This has been tested on Azure AKS and Amazon EKS, both running Kubernetes 1.29.
The example shows a PVC mounted with the access mode `ReadWriteOnce` as we have a single metastore instance that is deployed only once the jobs have completed, and, so long as these all run after each other, they can be deployed to different nodes.
Different scenarios may require a different access mode, the availability of which is dependent on the type of cluster in use.

== Alternative: using a custom image

If you have access to a registry to store custom images, another approach is to build a custom image on top of a Stackable product image and "bake" the driver into it directly:

.Copy the driver
[source]
----
FROM docker.stackable.tech/stackable/hive:3.1.3-stackable0.0.0-dev

RUN curl --fail -L https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.31/mysql-connector-j-8.0.31.jar -o /stackable/mysql-connector-j-8.0.31.jar
----

.Build and tag the image
[source]
----
docker build -f ./Dockerfile -t docker.stackable.tech/stackable/hive:3.1.3-stackable0.0.0-dev-mysql .
----

.Reference the new path to the driver without the need for using a volume mounted from a PVC
[source, yaml]
----
---
apiVersion: hive.stackable.tech/v1alpha1
kind: HiveCluster
metadata:
name: hive
spec:
image:
custom: docker.stackable.tech/stackable/hive:3.1.3-stackable0.0.0-dev-mysql # <1>
productVersion: 3.1.3
clusterConfig:
database:
...
s3:
...
metastore:
config:
logging:
enableVectorAgent: False
roleGroups:
default:
envOverrides:
METASTORE_AUX_JARS_PATH: "/stackable/mysql-connector-j-8.0.31.jar" # <2>
replicas: 1
----

<1> Name of the custom image containing the driver
<2> Path to the driver
1 change: 1 addition & 0 deletions docs/modules/hive/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
** xref:hive:usage-guide/listenerclass.adoc[]
** xref:hive:usage-guide/data-storage.adoc[]
** xref:hive:usage-guide/derby-example.adoc[]
** xref:hive:usage-guide/database-driver.adoc[]
** xref:hive:usage-guide/logging.adoc[]
** xref:hive:usage-guide/monitoring.adoc[]
** xref:hive:usage-guide/resources.adoc[]
Expand Down
Loading