From d3bca4373244947f099f1d69d349ec27cd017fbe Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Mon, 28 Aug 2023 09:35:17 +0200 Subject: [PATCH 01/10] Added metadata --- docs/modules/hive/pages/index.adoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/modules/hive/pages/index.adoc b/docs/modules/hive/pages/index.adoc index e0b8c854..61e21098 100644 --- a/docs/modules/hive/pages/index.adoc +++ b/docs/modules/hive/pages/index.adoc @@ -1,4 +1,6 @@ = Stackable Operator for Apache Hive +:description: The Stackable Operator for Apache Hive is a Kubernetes operator that can manage Apache Hive metastores. Learn about its features, resources, dependencies and demos, and see the list of supported Hive versions. +:keywords: Stackable Operator, Hadoop, Apache Hive, Kubernetes, k8s, operator, engineer, big data, metadata, storage, query This is an operator for Kubernetes that can manage https://hive.apache.org[Apache Hive] metastores. The Apache Hive metastore (HMS) stores information on the location of tables and partitions in file and blob storages such as HDFS and S3. From 4f6762e56ff38e9cedc43a32aa2f967ce0ad2606 Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Mon, 28 Aug 2023 09:48:36 +0200 Subject: [PATCH 02/10] Better intro and some restructuring --- docs/modules/hive/pages/index.adoc | 33 +++++++++++------------------- 1 file changed, 12 insertions(+), 21 deletions(-) diff --git a/docs/modules/hive/pages/index.adoc b/docs/modules/hive/pages/index.adoc index 61e21098..4f6bbb4d 100644 --- a/docs/modules/hive/pages/index.adoc +++ b/docs/modules/hive/pages/index.adoc @@ -2,33 +2,24 @@ :description: The Stackable Operator for Apache Hive is a Kubernetes operator that can manage Apache Hive metastores. Learn about its features, resources, dependencies and demos, and see the list of supported Hive versions. :keywords: Stackable Operator, Hadoop, Apache Hive, Kubernetes, k8s, operator, engineer, big data, metadata, storage, query -This is an operator for Kubernetes that can manage https://hive.apache.org[Apache Hive] metastores. -The Apache Hive metastore (HMS) stores information on the location of tables and partitions in file and blob storages such as HDFS and S3. - -Only the metastore is supported, not Hive itself. -There are several reasons why running Hive on Kubernetes may not be an optimal solution. -The most obvious reason is that Hive requires YARN as an execution framework, and YARN assumes much of the same role as Kubernetes - i.e. assigning resources. -For this reason we provide xref:trino:index.adoc[Trino] as a query engine in the Stackable Data Platform instead of Hive. Trino still uses the Hive Metastore, hence the inclusion of this operator as well. -There are multiple tools that can use the HMS: - -* HiveServer2 -** This is the "original" tool using the HMS. -** It offers an endpoint, where you can submit HiveQL (similar to SQL) queries. -** It needs a execution engine, e.g. YARN or Spark. -*** This operator does not support running the Hive server because of the complexity needed to operate YARN on Kubernetes. YARN is a resource manager which is not meant to be running on Kubernetes as Kubernetes already manages its own resources. -*** We offer Trino as a (often times drop-in) replacement (see below) -* Trino -** Takes SQL queries and executes them against the tables, whose metadata are stored in HMS. -** It should offer all the capabilities Hive offers including a lot of additional functionality, such as connections to other data sources. -* Spark -** Takes SQL or programmatic jobs and executes them against the tables, whose metadata are stored in HMS. -* And others +This is an operator for Kubernetes that can manage https://hive.apache.org[Apache Hive] metastores. +The Apache Hive metastore (HMS) was originally developed as part of Apache Hive. It stores information on the location of tables and partitions in file and blob storages such as HDFS and S3 and is now used by other tools besides Hive as well to access tables in files. +This Operator does not support deploying Hive itself, but xref:trino:index.adoc[Trino] is recommended as an alternative query engine. == Required external component: An SQL database The Hive metastore requires a database to store metadata. Consult the xref:required-external-components.adoc[required external components page] for an overview of the supported databases and minimum supported versions. +== Why no Hive? + +Only the metastore is supported, not Hive itself. +There are several reasons why running Hive on Kubernetes may not be an optimal solution. +The most obvious reason is that Hive requires YARN as an execution framework, and YARN assumes much of the same role as Kubernetes - i.e. assigning resources. +For this reason we provide xref:trino:index.adoc[Trino] as a query engine in the Stackable Data Platform instead of Hive. Trino still uses the Hive Metastore, hence the inclusion of this operator as well. Trino should offer all the capabilities Hive offers including a lot of additional functionality, such as connections to other data sources. + +Additionally, Tables in the HMS can also be accessed from xref:spark-k8s:index.adoc[Apache Spark]. + == Supported Versions The Stackable Operator for Apache Hive currently supports the following versions of Hive: From d8fb28b0c4acbc3910b002ea149251f96e7ccc15 Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Mon, 28 Aug 2023 11:18:25 +0200 Subject: [PATCH 03/10] Added diagram --- .../hive/images/hive_overview.drawio.svg | 4 ++++ docs/modules/hive/pages/index.adoc | 20 +++++++++++++++++++ 2 files changed, 24 insertions(+) create mode 100644 docs/modules/hive/images/hive_overview.drawio.svg diff --git a/docs/modules/hive/images/hive_overview.drawio.svg b/docs/modules/hive/images/hive_overview.drawio.svg new file mode 100644 index 00000000..693094b9 --- /dev/null +++ b/docs/modules/hive/images/hive_overview.drawio.svg @@ -0,0 +1,4 @@ + + + +
Pod
<name>-metastore-<rg1>-1
Pod...
Hive
Operator
Hive...
StatefulSet
<name>-metastore-<rg1>
StatefulSet...
Service
<name>-metastore-<rg1>
Service...
Pod
<name>-metastore-<rg1>-0
Pod...
ConfigMap
<name>-metastore-<rg1>
ConfigMap...
HiveCluster
<name>
HiveCluster...
create
create
read
read
Legend
Legend
Operator
Operator
Resource
Resource
Custom
Resource
Custom...
role group
<rg1>
role group...
Service
<name>-external
Service...
role
metastore
role...
references
references
ConfigMap
<name>
ConfigMap...
discovery
ConfigMap
discovery...
StatefulSet
<name>-metastore-<rg2>
StatefulSet...
Service
<name>-metastore-<rg2>
Service...
ConfigMap
<name>-metastore-<rg2>
ConfigMap...
role group
<rg2>
role group...
Pod
<name>-metastore-<rg2>-0
Pod...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/modules/hive/pages/index.adoc b/docs/modules/hive/pages/index.adoc index 4f6bbb4d..c103960a 100644 --- a/docs/modules/hive/pages/index.adoc +++ b/docs/modules/hive/pages/index.adoc @@ -6,6 +6,22 @@ This is an operator for Kubernetes that can manage https://hive.apache.org[Apach The Apache Hive metastore (HMS) was originally developed as part of Apache Hive. It stores information on the location of tables and partitions in file and blob storages such as HDFS and S3 and is now used by other tools besides Hive as well to access tables in files. This Operator does not support deploying Hive itself, but xref:trino:index.adoc[Trino] is recommended as an alternative query engine. +== Getting started + +TODO + +== Operator model + +The Operator manages the _HiveCluster_ custom resource. The cluster implements a single `metastore` xref:home:concepts:roles-and-role-groups.adoc[role]. + +image::hive_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable Operator for Apache Hive] + +discovery ConfigMap with a connect string + +== Dependencies + +The Hive Operator depends on the Stackable commons and secret operators. + == Required external component: An SQL database The Hive metastore requires a database to store metadata. @@ -20,6 +36,10 @@ For this reason we provide xref:trino:index.adoc[Trino] as a query engine in the Additionally, Tables in the HMS can also be accessed from xref:spark-k8s:index.adoc[Apache Spark]. +== Demos + +lakehouse pyspark-ml connect to Spark + == Supported Versions The Stackable Operator for Apache Hive currently supports the following versions of Hive: From 9056c23eccc3086225887481a9e35051530ee21a Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Mon, 28 Aug 2023 13:51:39 +0200 Subject: [PATCH 04/10] index page done --- .../hive/images/hive_overview.drawio.svg | 2 +- docs/modules/hive/pages/index.adoc | 24 ++++++++++++------- .../modules/hive/pages/usage-guide/index.adoc | 3 +++ .../hive/partials/supported-versions.adoc | 2 +- 4 files changed, 21 insertions(+), 10 deletions(-) create mode 100644 docs/modules/hive/pages/usage-guide/index.adoc diff --git a/docs/modules/hive/images/hive_overview.drawio.svg b/docs/modules/hive/images/hive_overview.drawio.svg index 693094b9..4d3ddf1f 100644 --- a/docs/modules/hive/images/hive_overview.drawio.svg +++ b/docs/modules/hive/images/hive_overview.drawio.svg @@ -1,4 +1,4 @@ -
Pod
<name>-metastore-<rg1>-1
Pod...
Hive
Operator
Hive...
StatefulSet
<name>-metastore-<rg1>
StatefulSet...
Service
<name>-metastore-<rg1>
Service...
Pod
<name>-metastore-<rg1>-0
Pod...
ConfigMap
<name>-metastore-<rg1>
ConfigMap...
HiveCluster
<name>
HiveCluster...
create
create
read
read
Legend
Legend
Operator
Operator
Resource
Resource
Custom
Resource
Custom...
role group
<rg1>
role group...
Service
<name>-external
Service...
role
metastore
role...
references
references
ConfigMap
<name>
ConfigMap...
discovery
ConfigMap
discovery...
StatefulSet
<name>-metastore-<rg2>
StatefulSet...
Service
<name>-metastore-<rg2>
Service...
ConfigMap
<name>-metastore-<rg2>
ConfigMap...
role group
<rg2>
role group...
Pod
<name>-metastore-<rg2>-0
Pod...
Text is not SVG - cannot display
\ No newline at end of file +
Pod
<name>-metastore-<rg1>-1
Pod...
Hive
Operator
Hive...
StatefulSet
<name>-metastore-<rg1>
StatefulSet...
Service
<name>-metastore-<rg1>
Service...
Pod
<name>-metastore-<rg1>-0
Pod...
ConfigMap
<name>-metastore-<rg1>
ConfigMap...
HiveCluster
<name>
HiveCluster...
create
create
read
read
Legend
Legend
Operator
Operator
Resource
Resource
Custom
Resource
Custom...
role group
<rg1>
role group...
Service
<name>
Service...
role
metastore
role...
references
references
ConfigMap
<name>
ConfigMap...
discovery
ConfigMap
discovery...
StatefulSet
<name>-metastore-<rg2>
StatefulSet...
Service
<name>-metastore-<rg2>
Service...
ConfigMap
<name>-metastore-<rg2>
ConfigMap...
role group
<rg2>
role group...
Pod
<name>-metastore-<rg2>-0
Pod...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/modules/hive/pages/index.adoc b/docs/modules/hive/pages/index.adoc index c103960a..2cd1cb73 100644 --- a/docs/modules/hive/pages/index.adoc +++ b/docs/modules/hive/pages/index.adoc @@ -8,7 +8,9 @@ This Operator does not support deploying Hive itself, but xref:trino:index.adoc[ == Getting started -TODO +Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable Hive Operator and its dependencies. It walks you through setting up a Hive metastore and connecting it to a demo Postgres database and a Minio instance to store data in. + +Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your Hive metastore configuration to your needs, or have a look at the <> for some example setups with either xref:trino:index.adoc[Trino] or xref:spark-k8s:index.adoc[Spark]. == Operator model @@ -16,18 +18,28 @@ The Operator manages the _HiveCluster_ custom resource. The cluster implements a image::hive_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable Operator for Apache Hive] -discovery ConfigMap with a connect string +For every role group the Operator creates a ConfigMap and StatefulSet which can have multiple replicas (Pods). Every role group is accessible through its own Service, and there is a Service for the whole cluster. + +The Operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the Hive metastore instance. The discovery ConfigMap contains information on how to connect to the HMS. == Dependencies -The Hive Operator depends on the Stackable commons and secret operators. +The Stackable Operator for Apache Hive depends on the Stackable xref:commons-operator:index.adoc[commons] and xref:secret-operator:index.adoc[secret] operators. == Required external component: An SQL database The Hive metastore requires a database to store metadata. Consult the xref:required-external-components.adoc[required external components page] for an overview of the supported databases and minimum supported versions. -== Why no Hive? +== [[demos]]Demos + +Three demos make use of the Hive metastore. + +The xref:stackablectl::demos/spark-k8s-anomaly-detection-taxi-data.adoc[] and xref:stackablectl::demos/trino-taxi-data.adoc[] use the HMS to store metadata information about taxi data. The first demo then analyzes the data using xref:spark-k8s:index.adoc[Apache Spark] and the second one using xref:trino:index.adoc[Trino]. + +The xref:stackablectl::demos/data-lakehouse-iceberg-trino-spark.adoc[] demo is the biggest demo available. It uses both Spark and Trino for analysis. + +== Why is the Hive query engine not supported? Only the metastore is supported, not Hive itself. There are several reasons why running Hive on Kubernetes may not be an optimal solution. @@ -36,10 +48,6 @@ For this reason we provide xref:trino:index.adoc[Trino] as a query engine in the Additionally, Tables in the HMS can also be accessed from xref:spark-k8s:index.adoc[Apache Spark]. -== Demos - -lakehouse pyspark-ml connect to Spark - == Supported Versions The Stackable Operator for Apache Hive currently supports the following versions of Hive: diff --git a/docs/modules/hive/pages/usage-guide/index.adoc b/docs/modules/hive/pages/usage-guide/index.adoc new file mode 100644 index 00000000..d1d593f0 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/index.adoc @@ -0,0 +1,3 @@ += Usage guide + +TODO \ No newline at end of file diff --git a/docs/modules/hive/partials/supported-versions.adoc b/docs/modules/hive/partials/supported-versions.adoc index 47b846b7..14ce8d4a 100644 --- a/docs/modules/hive/partials/supported-versions.adoc +++ b/docs/modules/hive/partials/supported-versions.adoc @@ -2,5 +2,5 @@ // This is a separate file, since it is used by both the direct Hive-Operator documentation, and the overarching // Stackable Platform documentation. -- 2.3.9 - 3.1.3 +- 2.3.9 From 8ca2c016a914cbb8c5eb73892089a84e1e8606ac Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Mon, 28 Aug 2023 16:57:11 +0200 Subject: [PATCH 05/10] Split usage guide (still messy) --- .../configuration-environment-overrides.adoc | 92 +++++ .../hive/pages/usage-guide/derby-example.adoc | 142 +++++++ docs/modules/hive/pages/usage-guide/hdfs.adoc | 11 + .../hive/pages/usage-guide/logging.adoc | 18 + .../hive/pages/usage-guide/monitoring.adoc | 4 + .../hive/pages/usage-guide/pod-placement.adoc | 22 ++ .../hive/pages/usage-guide/resources.adoc | 34 ++ docs/modules/hive/pages/usage-guide/s3.adoc | 18 + docs/modules/hive/pages/usage.adoc | 354 ------------------ 9 files changed, 341 insertions(+), 354 deletions(-) create mode 100644 docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc create mode 100644 docs/modules/hive/pages/usage-guide/derby-example.adoc create mode 100644 docs/modules/hive/pages/usage-guide/hdfs.adoc create mode 100644 docs/modules/hive/pages/usage-guide/logging.adoc create mode 100644 docs/modules/hive/pages/usage-guide/monitoring.adoc create mode 100644 docs/modules/hive/pages/usage-guide/pod-placement.adoc create mode 100644 docs/modules/hive/pages/usage-guide/resources.adoc create mode 100644 docs/modules/hive/pages/usage-guide/s3.adoc delete mode 100644 docs/modules/hive/pages/usage.adoc diff --git a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc new file mode 100644 index 00000000..757df50e --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc @@ -0,0 +1,92 @@ += Configuration & Environment Overrides + +The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role). + +IMPORTANT: Overriding certain properties, which are set by the operator (such as the HTTP port) can interfere with the operator and can lead to problems. + +== Configuration Properties + +For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files: + +- `hive-site.xml` +- `security.properties` + +For example, if you want to set the `datanucleus.connectionPool.maxPoolSize` for the metastore to 20 adapt the `metastore` section of the cluster resource like so: + +[source,yaml] +---- +metastore: + roleGroups: + default: + config: [...] + configOverrides: + hive-site.xml: + datanucleus.connectionPool.maxPoolSize: "20" + replicas: 1 +---- + +Just as for the `config`, it is possible to specify this at role level as well: + +[source,yaml] +---- +metastore: + configOverrides: + hive-site.xml: + datanucleus.connectionPool.maxPoolSize: "20" + roleGroups: + default: + config: [...] + replicas: 1 +---- + +All override property values must be strings. The properties will be formatted and escaped correctly into the XML file. + +For a full list of configuration options we refer to the Hive https://cwiki.apache.org/confluence/display/hive/configuration+properties[Configuration Reference]. + +== The security.properties file + +The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache. + +The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled. To cache resolved host names, you can configure the TTL of entries in the positive cache like this: + +[source,yaml] +---- + metastores: + configOverrides: + security.properties: + networkaddress.cache.ttl: "30" + networkaddress.cache.negative.ttl: "0" +---- + +NOTE: The operator configures DNS caching by default as shown in the example above. + +For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html + + +== Environment Variables + +In a similar fashion, environment variables can be (over)written. For example per role group: + +[source,yaml] +---- +metastore: + roleGroups: + default: + config: {} + envOverrides: + MY_ENV_VAR: "MY_VALUE" + replicas: 1 +---- + +or per role: + +[source,yaml] +---- +metastore: + envOverrides: + MY_ENV_VAR: "MY_VALUE" + roleGroups: + default: + config: {} + replicas: 1 +---- diff --git a/docs/modules/hive/pages/usage-guide/derby-example.adoc b/docs/modules/hive/pages/usage-guide/derby-example.adoc new file mode 100644 index 00000000..1d00d723 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/derby-example.adoc @@ -0,0 +1,142 @@ + +== Examples + +Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown. +This Stackable version is the version of the underlying container image which is used to execute the processes. +For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhive%2Ftags[image registry]. +It should generally be safe to simply use the latest image version that is available. + +.Create a single node Apache Hive Metastore cluster using Derby: +[source,yaml] +---- +--- +apiVersion: hive.stackable.tech/v1alpha1 +kind: HiveCluster +metadata: + name: simple-hive-derby +spec: + image: + productVersion: 3.1.3 + clusterConfig: + database: + connString: jdbc:derby:;databaseName=/tmp/metastore_db;create=true + user: APP + password: mine + dbType: derby + metastore: + roleGroups: + default: + replicas: 1 +---- + +WARNING: You should not use the `Derby` database with more than one replica or in production. Derby stores data locally and therefore the data is not shared between different metastore Pods and lost after Pod restarts. + +To create a single node Apache Hive Metastore (v2.3.9) cluster with derby and S3 access, deploy a minio (or use any available S3 bucket): +[source,bash] +---- +helm install minio \ + minio \ + --repo https://charts.bitnami.com/bitnami \ + --set auth.rootUser=minio-access-key \ + --set auth.rootPassword=minio-secret-key +---- + +In order to upload data to minio we need a port-forward to access the web ui. +[source,bash] +---- +kubectl port-forward service/minio 9001 +---- +Then, connect to localhost:9001 and login with the user `minio-access-key` and password `minio-secret-key`. Create a bucket and upload data. + +Deploy the hive cluster: +[source,yaml] +---- +--- +apiVersion: hive.stackable.tech/v1alpha1 +kind: HiveCluster +metadata: + name: simple-hive-derby +spec: + image: + productVersion: 3.1.3 + clusterConfig: + database: + connString: jdbc:derby:;databaseName=/stackable/metastore_db;create=true + user: APP + password: mine + dbType: derby + s3: + inline: + host: minio + port: 9000 + accessStyle: Path + credentials: + secretClass: simple-hive-s3-secret-class + metastore: + roleGroups: + default: + replicas: 1 +--- +apiVersion: secrets.stackable.tech/v1alpha1 +kind: SecretClass +metadata: + name: simple-hive-s3-secret-class +spec: + backend: + k8sSearch: + searchNamespace: + pod: {} +--- +apiVersion: v1 +kind: Secret +metadata: + name: simple-hive-s3-secret + labels: + secrets.stackable.tech/class: simple-hive-s3-secret-class +stringData: + accessKey: minio-access-key + secretKey: minio-secret-key +---- + + +To create a single node Apache Hive Metastore using PostgreSQL, deploy a PostgreSQL instance via helm. + +[sidebar] +PostgreSQL introduced a new way to encrypt its passwords in version 10. +This is called `scram-sha-256` and has been the default as of PostgreSQL 14. +Unfortunately, Hive up until the latest 3.3.x version ships with JDBC drivers that do https://wiki.postgresql.org/wiki/List_of_drivers[_not_ support] this method. +You might see an error message like this: +`The authentication type 10 is not supported.` +If this is the case please either use an older PostgreSQL version or change its https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION[`password_encryption`] setting to `md5`. + +This installs PostgreSQL in version 10 to work around the issue mentioned above: +[source,bash] +---- +helm install hive bitnami/postgresql --version=12.1.5 \ +--set postgresqlUsername=hive \ +--set postgresqlPassword=hive \ +--set postgresqlDatabase=hive +---- + +.Create Hive Metastore using a PostgreSQL database +[source,yaml] +---- +apiVersion: hive.stackable.tech/v1alpha1 +kind: HiveCluster +metadata: + name: simple-hive-postgres +spec: + image: + productVersion: 3.1.3 + clusterConfig: + database: + connString: jdbc:postgresql://hive-postgresql.default.svc.cluster.local:5432/hive + user: hive + password: hive + dbType: postgres + metastore: + roleGroups: + default: + replicas: 1 +---- + diff --git a/docs/modules/hive/pages/usage-guide/hdfs.adoc b/docs/modules/hive/pages/usage-guide/hdfs.adoc new file mode 100644 index 00000000..efac6998 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/hdfs.adoc @@ -0,0 +1,11 @@ +== Apache HDFS Support + +As well as S3, Hive also supports creating tables in HDFS. +You can add the HDFS connection in the top level `clusterConfig` as follows: + +[source,yaml] +---- +clusterConfig: + hdfs: + configMap: my-hdfs-cluster # Name of the HdfsCluster +---- diff --git a/docs/modules/hive/pages/usage-guide/logging.adoc b/docs/modules/hive/pages/usage-guide/logging.adoc new file mode 100644 index 00000000..fc99906f --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/logging.adoc @@ -0,0 +1,18 @@ += Log aggregation + +The logs can be forwarded to a Vector log aggregator by providing a discovery +ConfigMap for the aggregator and by enabling the log agent: + +[source,yaml] +---- +spec: + clusterConfig: + vectorAggregatorConfigMapName: vector-aggregator-discovery + metastore: + config: + logging: + enableVectorAgent: true +---- + +Further information on how to configure logging, can be found in +xref:concepts:logging.adoc[]. diff --git a/docs/modules/hive/pages/usage-guide/monitoring.adoc b/docs/modules/hive/pages/usage-guide/monitoring.adoc new file mode 100644 index 00000000..edb8f867 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/monitoring.adoc @@ -0,0 +1,4 @@ += Monitoring + +The managed Hive instances are automatically configured to export Prometheus metrics. See +xref:operators:monitoring.adoc[] for more details. diff --git a/docs/modules/hive/pages/usage-guide/pod-placement.adoc b/docs/modules/hive/pages/usage-guide/pod-placement.adoc new file mode 100644 index 00000000..ebdb4071 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/pod-placement.adoc @@ -0,0 +1,22 @@ += Pod Placement + +You can configure Pod placement for Hive metastores as described in xref:concepts:pod_placement.adoc[]. + +By default, the operator configures the following Pod placement constraints: + +[source,yaml] +---- +affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - podAffinityTerm: + labelSelector: + matchLabels: + app.kubernetes.io/name: hive + app.kubernetes.io/instance: cluster-name + app.kubernetes.io/component: metastore + topologyKey: kubernetes.io/hostname + weight: 70 +---- + +In the example above `cluster-name` is the name of the HiveCluster custom resource that owns this Pod. diff --git a/docs/modules/hive/pages/usage-guide/resources.adoc b/docs/modules/hive/pages/usage-guide/resources.adoc new file mode 100644 index 00000000..6c72daa1 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/resources.adoc @@ -0,0 +1,34 @@ +== Resource Requests + +include::home:concepts:stackable_resource_requests.adoc[] + +A minimal HA setup consisting of 2 Hive metastore instances has the following https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/[resource requirements]: + +* `100m` CPU request +* `3000m` CPU limit +* `1280m` memory request and limit + +Of course, additional services, require additional resources. For Stackable components, see the corresponding documentation on further resource requirements. + +Corresponding to the values above, the operator uses the following resource defaults: + +[source,yaml] +---- +metastore: + roleGroups: + default: + config: + resources: + requests: + cpu: "250m" + memory: "512Mi" + limits: + cpu: "1000m" + memory: "512Mi" +---- + +The operator may configure an additional container for log aggregation. This is done when log aggregation is configured as described in xref:concepts:logging.adoc[]. The resources for this container cannot be configured using the mechanism described above. Use xref:nightly@home:concepts:overrides.adoc#_pod_overrides[podOverrides] for this purpose. + +You can configure your own resource requests and limits by following the example above. + +For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods]. diff --git a/docs/modules/hive/pages/usage-guide/s3.adoc b/docs/modules/hive/pages/usage-guide/s3.adoc new file mode 100644 index 00000000..4390e667 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/s3.adoc @@ -0,0 +1,18 @@ += S3 Support + +Hive supports creating tables in S3 compatible object stores. +To use this feature you need to provide connection details for the object store using the xref:concepts:s3.adoc[S3Connection] in the top level `clusterConfig`. + +An example usage can look like this: + +[source,yaml] +---- +clusterConfig: + s3: + inline: + host: minio + port: 9000 + accessStyle: Path + credentials: + secretClass: simple-hive-s3-secret-class +---- diff --git a/docs/modules/hive/pages/usage.adoc b/docs/modules/hive/pages/usage.adoc deleted file mode 100644 index 3f5bd865..00000000 --- a/docs/modules/hive/pages/usage.adoc +++ /dev/null @@ -1,354 +0,0 @@ -= Usage - -== Requirements -Apache Hive Metastores need a relational database to store their state. -We currently support https://www.postgresql.org/[PostgreSQL] and https://db.apache.org/derby/[Apache Derby] (embedded database, not recommended for production). -Other databases might work if JDBC drivers are available. -Please open an https://github.com/stackabletech/hive-operator/issues[issue] if you require support for another database. - -== S3 Support - -Hive supports creating tables in S3 compatible object stores. -To use this feature you need to provide connection details for the object store using the xref:concepts:s3.adoc[S3Connection] in the top level `clusterConfig`. - -An example usage can look like this: - -[source,yaml] ----- -clusterConfig: - s3: - inline: - host: minio - port: 9000 - accessStyle: Path - credentials: - secretClass: simple-hive-s3-secret-class ----- - -== Apache HDFS Support - -As well as S3, Hive also supports creating tables in HDFS. -You can add the HDFS connection in the top level `clusterConfig` as follows: - -[source,yaml] ----- -clusterConfig: - hdfs: - configMap: my-hdfs-cluster # Name of the HdfsCluster ----- - -== Monitoring - -The managed Hive instances are automatically configured to export Prometheus metrics. See -xref:operators:monitoring.adoc[] for more details. - -== Log aggregation - -The logs can be forwarded to a Vector log aggregator by providing a discovery -ConfigMap for the aggregator and by enabling the log agent: - -[source,yaml] ----- -spec: - clusterConfig: - vectorAggregatorConfigMapName: vector-aggregator-discovery - metastore: - config: - logging: - enableVectorAgent: true ----- - -Further information on how to configure logging, can be found in -xref:concepts:logging.adoc[]. - -== Configuration & Environment Overrides - -The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role). - -IMPORTANT: Overriding certain properties, which are set by the operator (such as the HTTP port) can interfere with the operator and can lead to problems. - -=== Configuration Properties - -For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files: - -- `hive-site.xml` -- `security.properties` - -For example, if you want to set the `datanucleus.connectionPool.maxPoolSize` for the metastore to 20 adapt the `metastore` section of the cluster resource like so: - -[source,yaml] ----- -metastore: - roleGroups: - default: - config: [...] - configOverrides: - hive-site.xml: - datanucleus.connectionPool.maxPoolSize: "20" - replicas: 1 ----- - -Just as for the `config`, it is possible to specify this at role level as well: - -[source,yaml] ----- -metastore: - configOverrides: - hive-site.xml: - datanucleus.connectionPool.maxPoolSize: "20" - roleGroups: - default: - config: [...] - replicas: 1 ----- - -All override property values must be strings. The properties will be formatted and escaped correctly into the XML file. - -For a full list of configuration options we refer to the Hive https://cwiki.apache.org/confluence/display/hive/configuration+properties[Configuration Reference]. - -=== The security.properties file - -The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache. - -The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled. To cache resolved host names, you can configure the TTL of entries in the positive cache like this: - -[source,yaml] ----- - metastores: - configOverrides: - security.properties: - networkaddress.cache.ttl: "30" - networkaddress.cache.negative.ttl: "0" ----- - -NOTE: The operator configures DNS caching by default as shown in the example above. - -For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html - - -=== Environment Variables - -In a similar fashion, environment variables can be (over)written. For example per role group: - -[source,yaml] ----- -metastore: - roleGroups: - default: - config: {} - envOverrides: - MY_ENV_VAR: "MY_VALUE" - replicas: 1 ----- - -or per role: - -[source,yaml] ----- -metastore: - envOverrides: - MY_ENV_VAR: "MY_VALUE" - roleGroups: - default: - config: {} - replicas: 1 ----- - -=== Resource Requests - -include::home:concepts:stackable_resource_requests.adoc[] - -A minimal HA setup consisting of 2 Hive metastore instances has the following https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/[resource requirements]: - -* `100m` CPU request -* `3000m` CPU limit -* `1280m` memory request and limit - -Of course, additional services, require additional resources. For Stackable components, see the corresponding documentation on further resource requirements. - -Corresponding to the values above, the operator uses the following resource defaults: - -[source,yaml] ----- -metastore: - roleGroups: - default: - config: - resources: - requests: - cpu: "250m" - memory: "512Mi" - limits: - cpu: "1000m" - memory: "512Mi" ----- - -The operator may configure an additional container for log aggregation. This is done when log aggregation is configured as described in xref:concepts:logging.adoc[]. The resources for this container cannot be configured using the mechanism described above. Use xref:nightly@home:concepts:overrides.adoc#_pod_overrides[podOverrides] for this purpose. - -You can configure your own resource requests and limits by following the example above. - -For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods]. - -== Examples - -Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown. -This Stackable version is the version of the underlying container image which is used to execute the processes. -For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhive%2Ftags[image registry]. -It should generally be safe to simply use the latest image version that is available. - -.Create a single node Apache Hive Metastore cluster using Derby: -[source,yaml] ----- ---- -apiVersion: hive.stackable.tech/v1alpha1 -kind: HiveCluster -metadata: - name: simple-hive-derby -spec: - image: - productVersion: 3.1.3 - clusterConfig: - database: - connString: jdbc:derby:;databaseName=/tmp/metastore_db;create=true - user: APP - password: mine - dbType: derby - metastore: - roleGroups: - default: - replicas: 1 ----- - -WARNING: You should not use the `Derby` database with more than one replica or in production. Derby stores data locally and therefore the data is not shared between different metastore Pods and lost after Pod restarts. - -To create a single node Apache Hive Metastore (v2.3.9) cluster with derby and S3 access, deploy a minio (or use any available S3 bucket): -[source,bash] ----- -helm install minio \ - minio \ - --repo https://charts.bitnami.com/bitnami \ - --set auth.rootUser=minio-access-key \ - --set auth.rootPassword=minio-secret-key ----- - -In order to upload data to minio we need a port-forward to access the web ui. -[source,bash] ----- -kubectl port-forward service/minio 9001 ----- -Then, connect to localhost:9001 and login with the user `minio-access-key` and password `minio-secret-key`. Create a bucket and upload data. - -Deploy the hive cluster: -[source,yaml] ----- ---- -apiVersion: hive.stackable.tech/v1alpha1 -kind: HiveCluster -metadata: - name: simple-hive-derby -spec: - image: - productVersion: 3.1.3 - clusterConfig: - database: - connString: jdbc:derby:;databaseName=/stackable/metastore_db;create=true - user: APP - password: mine - dbType: derby - s3: - inline: - host: minio - port: 9000 - accessStyle: Path - credentials: - secretClass: simple-hive-s3-secret-class - metastore: - roleGroups: - default: - replicas: 1 ---- -apiVersion: secrets.stackable.tech/v1alpha1 -kind: SecretClass -metadata: - name: simple-hive-s3-secret-class -spec: - backend: - k8sSearch: - searchNamespace: - pod: {} ---- -apiVersion: v1 -kind: Secret -metadata: - name: simple-hive-s3-secret - labels: - secrets.stackable.tech/class: simple-hive-s3-secret-class -stringData: - accessKey: minio-access-key - secretKey: minio-secret-key ----- - - -To create a single node Apache Hive Metastore using PostgreSQL, deploy a PostgreSQL instance via helm. - -[sidebar] -PostgreSQL introduced a new way to encrypt its passwords in version 10. -This is called `scram-sha-256` and has been the default as of PostgreSQL 14. -Unfortunately, Hive up until the latest 3.3.x version ships with JDBC drivers that do https://wiki.postgresql.org/wiki/List_of_drivers[_not_ support] this method. -You might see an error message like this: -`The authentication type 10 is not supported.` -If this is the case please either use an older PostgreSQL version or change its https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION[`password_encryption`] setting to `md5`. - -This installs PostgreSQL in version 10 to work around the issue mentioned above: -[source,bash] ----- -helm install hive bitnami/postgresql --version=12.1.5 \ ---set postgresqlUsername=hive \ ---set postgresqlPassword=hive \ ---set postgresqlDatabase=hive ----- - -.Create Hive Metastore using a PostgreSQL database -[source,yaml] ----- -apiVersion: hive.stackable.tech/v1alpha1 -kind: HiveCluster -metadata: - name: simple-hive-postgres -spec: - image: - productVersion: 3.1.3 - clusterConfig: - database: - connString: jdbc:postgresql://hive-postgresql.default.svc.cluster.local:5432/hive - user: hive - password: hive - dbType: postgres - metastore: - roleGroups: - default: - replicas: 1 ----- - -=== Pod Placement - -You can configure Pod placement for Hive metastores as described in xref:concepts:pod_placement.adoc[]. - -By default, the operator configures the following Pod placement constraints: - -[source,yaml] ----- -affinity: - podAntiAffinity: - preferredDuringSchedulingIgnoredDuringExecution: - - podAffinityTerm: - labelSelector: - matchLabels: - app.kubernetes.io/name: hive - app.kubernetes.io/instance: cluster-name - app.kubernetes.io/component: metastore - topologyKey: kubernetes.io/hostname - weight: 70 ----- - -In the example above `cluster-name` is the name of the HiveCluster custom resource that owns this Pod. From 26862e2047b633c81bd924b50720dc2a530c7207 Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Tue, 29 Aug 2023 11:20:06 +0200 Subject: [PATCH 06/10] ... --- .../pages/getting_started/first_steps.adoc | 2 +- docs/modules/hive/pages/index.adoc | 2 +- .../cluster-operations.adoc} | 2 +- .../configuration-environment-overrides.adoc | 2 +- .../hive/pages/usage-guide/data-storage.adoc | 37 +++++++++++++++++++ .../hive/pages/usage-guide/derby-example.adoc | 2 +- docs/modules/hive/pages/usage-guide/hdfs.adoc | 11 ------ .../modules/hive/pages/usage-guide/index.adoc | 3 +- .../hive/pages/usage-guide/pod-placement.adoc | 2 +- docs/modules/hive/pages/usage-guide/s3.adoc | 18 --------- docs/modules/hive/partials/nav.adoc | 10 ++++- 11 files changed, 53 insertions(+), 38 deletions(-) rename docs/modules/hive/pages/{cluster_operations.adoc => usage-guide/cluster-operations.adoc} (91%) create mode 100644 docs/modules/hive/pages/usage-guide/data-storage.adoc delete mode 100644 docs/modules/hive/pages/usage-guide/hdfs.adoc delete mode 100644 docs/modules/hive/pages/usage-guide/s3.adoc diff --git a/docs/modules/hive/pages/getting_started/first_steps.adoc b/docs/modules/hive/pages/getting_started/first_steps.adoc index 245de761..5991b7ae 100644 --- a/docs/modules/hive/pages/getting_started/first_steps.adoc +++ b/docs/modules/hive/pages/getting_started/first_steps.adoc @@ -72,4 +72,4 @@ For further testing we recommend to use e.g. the python https://github.com/quint == What's next -Have a look at the xref:usage.adoc[] page to find out more about the features of the Operator. +Have a look at the xref:usage-guide/index.adoc[usage guide] to find out more about the features of the Operator. diff --git a/docs/modules/hive/pages/index.adoc b/docs/modules/hive/pages/index.adoc index 2cd1cb73..b1804db3 100644 --- a/docs/modules/hive/pages/index.adoc +++ b/docs/modules/hive/pages/index.adoc @@ -3,7 +3,7 @@ :keywords: Stackable Operator, Hadoop, Apache Hive, Kubernetes, k8s, operator, engineer, big data, metadata, storage, query This is an operator for Kubernetes that can manage https://hive.apache.org[Apache Hive] metastores. -The Apache Hive metastore (HMS) was originally developed as part of Apache Hive. It stores information on the location of tables and partitions in file and blob storages such as HDFS and S3 and is now used by other tools besides Hive as well to access tables in files. +The Apache Hive metastore (HMS) was originally developed as part of Apache Hive. It stores information on the location of tables and partitions in file and blob storages such as xref:hdfs:index.adoc[Apache HDFS] and S3 and is now used by other tools besides Hive as well to access tables in files. This Operator does not support deploying Hive itself, but xref:trino:index.adoc[Trino] is recommended as an alternative query engine. == Getting started diff --git a/docs/modules/hive/pages/cluster_operations.adoc b/docs/modules/hive/pages/usage-guide/cluster-operations.adoc similarity index 91% rename from docs/modules/hive/pages/cluster_operations.adoc rename to docs/modules/hive/pages/usage-guide/cluster-operations.adoc index fa66dc68..f0a8db75 100644 --- a/docs/modules/hive/pages/cluster_operations.adoc +++ b/docs/modules/hive/pages/usage-guide/cluster-operations.adoc @@ -1,4 +1,4 @@ -= Cluster Operation += Cluster operation Hive installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:cluster_operations.adoc[cluster operations] for more details. \ No newline at end of file diff --git a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc index 757df50e..d90254c4 100644 --- a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc +++ b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc @@ -1,4 +1,4 @@ -= Configuration & Environment Overrides += Configuration & environment overrides The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role). diff --git a/docs/modules/hive/pages/usage-guide/data-storage.adoc b/docs/modules/hive/pages/usage-guide/data-storage.adoc new file mode 100644 index 00000000..658aaa57 --- /dev/null +++ b/docs/modules/hive/pages/usage-guide/data-storage.adoc @@ -0,0 +1,37 @@ += Data storage backends + +Hive does not store data, only metadata. It can store metadata about data stored in various places. The Stackable Operator currently supports S3 and HFS. + +== [[s3]]S3 support + +Hive supports creating tables in S3 compatible object stores. +To use this feature you need to provide connection details for the object store using the xref:concepts:s3.adoc[S3Connection] in the top level `clusterConfig`. + +An example usage can look like this: + +[source,yaml] +---- +clusterConfig: + s3: + inline: + host: minio + port: 9000 + accessStyle: Path + credentials: + secretClass: simple-hive-s3-secret-class +---- + + +== [[hdfs]]Apache HDFS support + +As well as S3, Hive also supports creating tables in HDFS. +You can add the HDFS connection in the top level `clusterConfig` as follows: + +[source,yaml] +---- +clusterConfig: + hdfs: + configMap: my-hdfs-cluster # Name of the HdfsCluster +---- + +Read about the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS] to learn more about setting up HDFS. diff --git a/docs/modules/hive/pages/usage-guide/derby-example.adoc b/docs/modules/hive/pages/usage-guide/derby-example.adoc index 1d00d723..9b6d804f 100644 --- a/docs/modules/hive/pages/usage-guide/derby-example.adoc +++ b/docs/modules/hive/pages/usage-guide/derby-example.adoc @@ -1,5 +1,5 @@ -== Examples += Derby example Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. diff --git a/docs/modules/hive/pages/usage-guide/hdfs.adoc b/docs/modules/hive/pages/usage-guide/hdfs.adoc deleted file mode 100644 index efac6998..00000000 --- a/docs/modules/hive/pages/usage-guide/hdfs.adoc +++ /dev/null @@ -1,11 +0,0 @@ -== Apache HDFS Support - -As well as S3, Hive also supports creating tables in HDFS. -You can add the HDFS connection in the top level `clusterConfig` as follows: - -[source,yaml] ----- -clusterConfig: - hdfs: - configMap: my-hdfs-cluster # Name of the HdfsCluster ----- diff --git a/docs/modules/hive/pages/usage-guide/index.adoc b/docs/modules/hive/pages/usage-guide/index.adoc index d1d593f0..2fc3aea2 100644 --- a/docs/modules/hive/pages/usage-guide/index.adoc +++ b/docs/modules/hive/pages/usage-guide/index.adoc @@ -1,3 +1,4 @@ = Usage guide +:page-aliases: usage.adoc -TODO \ No newline at end of file +This Section will help you to use and configure the Stackable Operator for Apache Hive in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies. diff --git a/docs/modules/hive/pages/usage-guide/pod-placement.adoc b/docs/modules/hive/pages/usage-guide/pod-placement.adoc index ebdb4071..435bcc51 100644 --- a/docs/modules/hive/pages/usage-guide/pod-placement.adoc +++ b/docs/modules/hive/pages/usage-guide/pod-placement.adoc @@ -1,4 +1,4 @@ -= Pod Placement += Pod placement You can configure Pod placement for Hive metastores as described in xref:concepts:pod_placement.adoc[]. diff --git a/docs/modules/hive/pages/usage-guide/s3.adoc b/docs/modules/hive/pages/usage-guide/s3.adoc deleted file mode 100644 index 4390e667..00000000 --- a/docs/modules/hive/pages/usage-guide/s3.adoc +++ /dev/null @@ -1,18 +0,0 @@ -= S3 Support - -Hive supports creating tables in S3 compatible object stores. -To use this feature you need to provide connection details for the object store using the xref:concepts:s3.adoc[S3Connection] in the top level `clusterConfig`. - -An example usage can look like this: - -[source,yaml] ----- -clusterConfig: - s3: - inline: - host: minio - port: 9000 - accessStyle: Path - credentials: - secretClass: simple-hive-s3-secret-class ----- diff --git a/docs/modules/hive/partials/nav.adoc b/docs/modules/hive/partials/nav.adoc index 0c7c5137..fb13a99e 100644 --- a/docs/modules/hive/partials/nav.adoc +++ b/docs/modules/hive/partials/nav.adoc @@ -3,7 +3,13 @@ ** xref:hive:getting_started/first_steps.adoc[] * Concepts ** xref:hive:discovery.adoc[] -** xref:hive:cluster_operations.adoc[] -* xref:hive:usage.adoc[] * xref:hive:required-external-components.adoc[] +* xref:hive:usage-guide/index.adoc[] +** xref:hive:usage-guide/cluster-operations.adoc[] +** xref:hive:usage-guide/pod-placement.adoc[] +** xref:hive:usage-guide/data-storage.adoc[] +** xref:hive:usage-guide/derby-example.adoc[] +** xref:hive:usage-guide/logging.adoc[] +** xref:hive:usage-guide/monitoring.adoc[] +** xref:hive:usage-guide/configuration-environment-overrides.adoc[] * xref:hive:configuration.adoc[] From cc277601ce44e403b44f1536a902dd762f48649d Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Wed, 30 Aug 2023 09:39:14 +0200 Subject: [PATCH 07/10] Update docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc Co-authored-by: Malte Sander --- .../pages/usage-guide/configuration-environment-overrides.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc index d90254c4..1eb79245 100644 --- a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc +++ b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc @@ -47,7 +47,7 @@ For a full list of configuration options we refer to the Hive https://cwiki.apac The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache. -The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled. To cache resolved host names, you can configure the TTL of entries in the positive cache like this: +The JVM manages its own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.1.3 Apache Hive performs poorly if the positive cache is disabled. To cache resolved host names, you can configure the TTL of entries in the positive cache like this: [source,yaml] ---- From 02bfe1a8fd5e2fe98de4f415c0ec59e630b2d9bc Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Wed, 30 Aug 2023 09:39:26 +0200 Subject: [PATCH 08/10] Update docs/modules/hive/pages/usage-guide/derby-example.adoc Co-authored-by: Malte Sander --- docs/modules/hive/pages/usage-guide/derby-example.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/hive/pages/usage-guide/derby-example.adoc b/docs/modules/hive/pages/usage-guide/derby-example.adoc index 9b6d804f..cfa50441 100644 --- a/docs/modules/hive/pages/usage-guide/derby-example.adoc +++ b/docs/modules/hive/pages/usage-guide/derby-example.adoc @@ -31,7 +31,7 @@ spec: WARNING: You should not use the `Derby` database with more than one replica or in production. Derby stores data locally and therefore the data is not shared between different metastore Pods and lost after Pod restarts. -To create a single node Apache Hive Metastore (v2.3.9) cluster with derby and S3 access, deploy a minio (or use any available S3 bucket): +To create a single node Apache Hive Metastore (v3.1.3) cluster with derby and S3 access, deploy a minio (or use any available S3 bucket): [source,bash] ---- helm install minio \ From 802fdaa175e1f614120aea581dfd4ea7017e80fa Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Wed, 30 Aug 2023 09:39:46 +0200 Subject: [PATCH 09/10] Update docs/modules/hive/pages/usage-guide/derby-example.adoc Co-authored-by: Malte Sander --- docs/modules/hive/pages/usage-guide/derby-example.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/hive/pages/usage-guide/derby-example.adoc b/docs/modules/hive/pages/usage-guide/derby-example.adoc index cfa50441..2d27296c 100644 --- a/docs/modules/hive/pages/usage-guide/derby-example.adoc +++ b/docs/modules/hive/pages/usage-guide/derby-example.adoc @@ -29,7 +29,7 @@ spec: replicas: 1 ---- -WARNING: You should not use the `Derby` database with more than one replica or in production. Derby stores data locally and therefore the data is not shared between different metastore Pods and lost after Pod restarts. +WARNING: You should not use the `Derby` database in production. Derby stores data locally which does not work in high availability setups (multiple replicas) and all data is lost after Pod restarts. To create a single node Apache Hive Metastore (v3.1.3) cluster with derby and S3 access, deploy a minio (or use any available S3 bucket): [source,bash] From 1f8c86cbd1e4d5c85b57a1c1b64cd1c5f5c4ac6f Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Wed, 30 Aug 2023 10:51:13 +0200 Subject: [PATCH 10/10] Update docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc Co-authored-by: Malte Sander --- .../usage-guide/configuration-environment-overrides.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc index 1eb79245..847eacd0 100644 --- a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc +++ b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc @@ -18,7 +18,7 @@ For example, if you want to set the `datanucleus.connectionPool.maxPoolSize` for metastore: roleGroups: default: - config: [...] + config: {} configOverrides: hive-site.xml: datanucleus.connectionPool.maxPoolSize: "20" @@ -35,7 +35,7 @@ metastore: datanucleus.connectionPool.maxPoolSize: "20" roleGroups: default: - config: [...] + config: {} replicas: 1 ----