From 026df9107c16111eb06f457e613ee27fd3ebb001 Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Mon, 16 Sep 2024 16:45:49 +0200 Subject: [PATCH] Add descriptions --- .../pages/getting_started/first_steps.adoc | 6 ++--- .../hive/pages/getting_started/index.adoc | 4 ++- .../pages/getting_started/installation.adoc | 27 +++++++++---------- docs/modules/hive/pages/index.adoc | 2 +- .../pages/required-external-components.adoc | 4 ++- .../configuration-environment-overrides.adoc | 5 ++-- .../hive/pages/usage-guide/data-storage.adoc | 1 + .../pages/usage-guide/database-driver.adoc | 3 ++- .../hive/pages/usage-guide/derby-example.adoc | 2 +- .../modules/hive/pages/usage-guide/index.adoc | 4 ++- .../hive/pages/usage-guide/listenerclass.adoc | 3 ++- .../hive/pages/usage-guide/logging.adoc | 7 +++-- .../hive/pages/usage-guide/monitoring.adoc | 5 ++-- .../hive/pages/usage-guide/resources.adoc | 3 ++- .../hive/pages/usage-guide/security.adoc | 3 ++- 15 files changed, 45 insertions(+), 34 deletions(-) diff --git a/docs/modules/hive/pages/getting_started/first_steps.adoc b/docs/modules/hive/pages/getting_started/first_steps.adoc index 2fae4d59..cc8e89a1 100644 --- a/docs/modules/hive/pages/getting_started/first_steps.adoc +++ b/docs/modules/hive/pages/getting_started/first_steps.adoc @@ -1,8 +1,8 @@ = First steps +:description: Deploy and verify a Hive metastore cluster with PostgreSQL and MinIO. Follow our setup guide and ensure all pods are ready for operation. -After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you -will now deploy a Hive metastore cluster and it's dependencies. Afterwards you can -<<_verify_that_it_works, verify that it works>>. +After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Hive metastore cluster and it's dependencies. +Afterwards you can <<_verify_that_it_works, verify that it works>>. == Setup diff --git a/docs/modules/hive/pages/getting_started/index.adoc b/docs/modules/hive/pages/getting_started/index.adoc index c0f2798a..633e9c8c 100644 --- a/docs/modules/hive/pages/getting_started/index.adoc +++ b/docs/modules/hive/pages/getting_started/index.adoc @@ -1,6 +1,8 @@ = Getting started +:description: Learn to set up Apache Hive with the Stackable Operator. Includes installation, dependencies, and creating a Hive metastore on Kubernetes. -This guide will get you started with Apache Hive using the Stackable Operator. It will guide you through the installation of the operator, its dependencies and setting up your first Hive metastore instance. +This guide will get you started with Apache Hive using the Stackable Operator. +It will guide you through the installation of the operator, its dependencies and setting up your first Hive metastore instance. == Prerequisites diff --git a/docs/modules/hive/pages/getting_started/installation.adoc b/docs/modules/hive/pages/getting_started/installation.adoc index 98dc25e8..ab29a3d8 100644 --- a/docs/modules/hive/pages/getting_started/installation.adoc +++ b/docs/modules/hive/pages/getting_started/installation.adoc @@ -1,16 +1,16 @@ = Installation +:description: Install Stackable Operator for Apache Hive with MinIO and PostgreSQL using stackablectl or Helm. Follow our guide for easy setup and configuration. -On this page you will install the Stackable Operator for Apache Hive and all required dependencies. For the installation -of the dependencies and operators you can use Helm or `stackablectl`. +On this page you will install the Stackable Operator for Apache Hive and all required dependencies. +For the installation of the dependencies and operators you can use Helm or `stackablectl`. -The `stackablectl` command line tool is the recommended way to interact with operators and dependencies. Follow the -xref:management:stackablectl:installation.adoc[installation steps] for your platform if you choose to work with -`stackablectl`. +The `stackablectl` command line tool is the recommended way to interact with operators and dependencies. +Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform if you choose to work with `stackablectl`. == Dependencies -First you need to install MinIO and PostgreSQL instances for the Hive metastore. PostgreSQL is required as a database -for Hive's metadata, and MinIO will be used as a data store, which the Hive metastore also needs access to. +First you need to install MinIO and PostgreSQL instances for the Hive metastore. +PostgreSQL is required as a database for Hive's metadata, and MinIO will be used as a data store, which the Hive metastore also needs access to. There are two ways to install the dependencies: @@ -21,9 +21,8 @@ WARNING: The dependency installations in this guide are only intended for testin === stackablectl -`stackablectl` was designed to install Stackable components, but its xref:management:stackablectl:commands/stack.adoc[Stacks] -feature can also be used to install arbitrary Helm Charts. You can install MinIO and PostgreSQL using the Stacks feature -as follows, but a simpler method via Helm is shown <>. +`stackablectl` was designed to install Stackable components, but its xref:management:stackablectl:commands/stack.adoc[Stacks] feature can also be used to install arbitrary Helm Charts. +You can install MinIO and PostgreSQL using the Stacks feature as follows, but a simpler method via Helm is shown <>. [source,bash] ---- @@ -67,8 +66,8 @@ Now call `stackablectl` and reference those two files: include::example$getting_started/getting_started.sh[tag=stackablectl-install-minio-postgres-stack] ---- -This will install MinIO and PostgreSQL as defined in the Stacks, as well as the Operators. You can now skip the -<> step that follows next. +This will install MinIO and PostgreSQL as defined in the Stacks, as well as the Operators. +You can now skip the <> step that follows next. TIP: Consult the xref:management:stackablectl:quickstart.adoc[Quickstart] to learn more about how to use `stackablectl`. @@ -133,8 +132,8 @@ Then install the Stackable operators: include::example$getting_started/getting_started.sh[tag=helm-install-operators] ---- -Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the -CRDs for the required operators). You are now ready to deploy the Apache Hive metastore in Kubernetes. +Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the Apache Hive service (as well as the CRDs for the required operators). +You are now ready to deploy the Apache Hive metastore in Kubernetes. == What's next diff --git a/docs/modules/hive/pages/index.adoc b/docs/modules/hive/pages/index.adoc index e66e89b6..2e38dfca 100644 --- a/docs/modules/hive/pages/index.adoc +++ b/docs/modules/hive/pages/index.adoc @@ -1,5 +1,5 @@ = Stackable Operator for Apache Hive -:description: The Stackable Operator for Apache Hive is a Kubernetes operator that can manage Apache Hive metastores. Learn about its features, resources, dependencies and demos, and see the list of supported Hive versions. +:description: Manage Apache Hive metastores on Kubernetes with the Stackable Operator. Integrates with Trino and Spark. :keywords: Stackable Operator, Hadoop, Apache Hive, Kubernetes, k8s, operator, engineer, big data, metadata, storage, query :hive: https://hive.apache.org :github: https://github.com/stackabletech/hive-operator/ diff --git a/docs/modules/hive/pages/required-external-components.adoc b/docs/modules/hive/pages/required-external-components.adoc index 2b1666e7..111acdbf 100644 --- a/docs/modules/hive/pages/required-external-components.adoc +++ b/docs/modules/hive/pages/required-external-components.adoc @@ -1,6 +1,8 @@ = Required external components +:description: Hive Metastore requires a SQL database. Supported options include MySQL, Postgres, Oracle, and MS SQL Server. Stackable Hive supports PostgreSQL by default. -The Hive Metastore requires a backend SQL database. Supported databases and versions are: +The Hive Metastore requires a backend SQL database. +Supported databases and versions are: * MySQL 5.6.17 and above * Postgres 9.1.13 and above diff --git a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc index 847eacd0..95b67825 100644 --- a/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc +++ b/docs/modules/hive/pages/usage-guide/configuration-environment-overrides.adoc @@ -1,4 +1,5 @@ = Configuration & environment overrides +:description: Override Hive config properties and environment variables at role or role group levels. Customize hive-site.xml, security.properties, and environment vars. The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role). @@ -8,8 +9,8 @@ IMPORTANT: Overriding certain properties, which are set by the operator (such as For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files: -- `hive-site.xml` -- `security.properties` +* `hive-site.xml` +* `security.properties` For example, if you want to set the `datanucleus.connectionPool.maxPoolSize` for the metastore to 20 adapt the `metastore` section of the cluster resource like so: diff --git a/docs/modules/hive/pages/usage-guide/data-storage.adoc b/docs/modules/hive/pages/usage-guide/data-storage.adoc index 658aaa57..eb8a2b92 100644 --- a/docs/modules/hive/pages/usage-guide/data-storage.adoc +++ b/docs/modules/hive/pages/usage-guide/data-storage.adoc @@ -1,4 +1,5 @@ = Data storage backends +:description: Hive supports metadata storage on S3 and HDFS. Configure S3 with S3Connection and HDFS with configMap in clusterConfig. Hive does not store data, only metadata. It can store metadata about data stored in various places. The Stackable Operator currently supports S3 and HFS. diff --git a/docs/modules/hive/pages/usage-guide/database-driver.adoc b/docs/modules/hive/pages/usage-guide/database-driver.adoc index 9b8d4cfb..45e38b24 100644 --- a/docs/modules/hive/pages/usage-guide/database-driver.adoc +++ b/docs/modules/hive/pages/usage-guide/database-driver.adoc @@ -1,7 +1,8 @@ = Database drivers +:description: Learn to configure Apache Hive with MySQL using Helm, PVCs, and custom images. Includes steps for driver setup and Hive cluster creation. The Stackable product images for Apache Hive come with built-in support for using PostgreSQL as the metastore database. -The MySQL driver is not shipped in our images due to licensing issues. +The MySQL driver is not shipped in Stackable images due to licensing issues. To use another supported database it is necessary to make the relevant drivers available to Hive: this tutorial shows how this is done for MySQL. == Install the MySQL helm chart diff --git a/docs/modules/hive/pages/usage-guide/derby-example.adoc b/docs/modules/hive/pages/usage-guide/derby-example.adoc index 56bee6bc..b0b358b2 100644 --- a/docs/modules/hive/pages/usage-guide/derby-example.adoc +++ b/docs/modules/hive/pages/usage-guide/derby-example.adoc @@ -1,5 +1,5 @@ - = Derby example +:description: Deploy a single-node Apache Hive Metastore with Derby or PostgreSQL. Includes setup for S3 integration and tips for database configuration. Please note that the version you need to specify is not only the version of Apache Hive which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. diff --git a/docs/modules/hive/pages/usage-guide/index.adoc b/docs/modules/hive/pages/usage-guide/index.adoc index 2fc3aea2..d00f5384 100644 --- a/docs/modules/hive/pages/usage-guide/index.adoc +++ b/docs/modules/hive/pages/usage-guide/index.adoc @@ -1,4 +1,6 @@ = Usage guide :page-aliases: usage.adoc -This Section will help you to use and configure the Stackable Operator for Apache Hive in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies. +This Section will help you to use and configure the Stackable Operator for Apache Hive in various ways. +You should already be familiar with how to set up a basic instance. +Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies. diff --git a/docs/modules/hive/pages/usage-guide/listenerclass.adoc b/docs/modules/hive/pages/usage-guide/listenerclass.adoc index 14f51e51..02b75e53 100644 --- a/docs/modules/hive/pages/usage-guide/listenerclass.adoc +++ b/docs/modules/hive/pages/usage-guide/listenerclass.adoc @@ -1,6 +1,7 @@ = Service exposition with ListenerClasses -Apache Hive offers an API. The Operator deploys a service called `` (where `` is the name of the HiveCluster) through which Hive can be reached. +Apache Hive offers an API. +The Operator deploys a service called `` (where `` is the name of the HiveCluster) through which Hive can be reached. This service can have three different types: `cluster-internal`, `external-unstable` and `external-stable`. Read more about the types in the xref:concepts:service-exposition.adoc[service exposition] documentation at platform level. diff --git a/docs/modules/hive/pages/usage-guide/logging.adoc b/docs/modules/hive/pages/usage-guide/logging.adoc index fc99906f..169fee8a 100644 --- a/docs/modules/hive/pages/usage-guide/logging.adoc +++ b/docs/modules/hive/pages/usage-guide/logging.adoc @@ -1,7 +1,7 @@ = Log aggregation +:description: The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent. -The logs can be forwarded to a Vector log aggregator by providing a discovery -ConfigMap for the aggregator and by enabling the log agent: +The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent: [source,yaml] ---- @@ -14,5 +14,4 @@ spec: enableVectorAgent: true ---- -Further information on how to configure logging, can be found in -xref:concepts:logging.adoc[]. +Further information on how to configure logging, can be found in xref:concepts:logging.adoc[]. diff --git a/docs/modules/hive/pages/usage-guide/monitoring.adoc b/docs/modules/hive/pages/usage-guide/monitoring.adoc index edb8f867..f7c971ed 100644 --- a/docs/modules/hive/pages/usage-guide/monitoring.adoc +++ b/docs/modules/hive/pages/usage-guide/monitoring.adoc @@ -1,4 +1,5 @@ = Monitoring +:description: The managed Hive instances are automatically configured to export Prometheus metrics. -The managed Hive instances are automatically configured to export Prometheus metrics. See -xref:operators:monitoring.adoc[] for more details. +The managed Hive instances are automatically configured to export Prometheus metrics. +See xref:operators:monitoring.adoc[] for more details. diff --git a/docs/modules/hive/pages/usage-guide/resources.adoc b/docs/modules/hive/pages/usage-guide/resources.adoc index 764b6a60..8bf4eb47 100644 --- a/docs/modules/hive/pages/usage-guide/resources.adoc +++ b/docs/modules/hive/pages/usage-guide/resources.adoc @@ -1,4 +1,5 @@ = Resource requests +:description: Set CPU and memory requests for Hive metastore in Kubernetes. Default values and customization options are provided for optimal resource management. include::home:concepts:stackable_resource_requests.adoc[] @@ -27,7 +28,7 @@ metastore: memory: "512Mi" ---- -The operator may configure an additional container for log aggregation. This is done when log aggregation is configured as described in xref:concepts:logging.adoc[]. The resources for this container cannot be configured using the mechanism described above. Use xref:nightly@home:concepts:overrides.adoc#_pod_overrides[podOverrides] for this purpose. +The operator may configure an additional container for log aggregation. This is done when log aggregation is configured as described in xref:concepts:logging.adoc[]. The resources for this container cannot be configured using the mechanism described above. Use xref:home:concepts:overrides.adoc#_pod_overrides[podOverrides] for this purpose. You can configure your own resource requests and limits by following the example above. diff --git a/docs/modules/hive/pages/usage-guide/security.adoc b/docs/modules/hive/pages/usage-guide/security.adoc index db6a86d3..69fe7654 100644 --- a/docs/modules/hive/pages/usage-guide/security.adoc +++ b/docs/modules/hive/pages/usage-guide/security.adoc @@ -1,4 +1,5 @@ = Security +:description: Secure Apache Hive with Kerberos authentication in Kubernetes. Configure Kerberos server, SecretClass, and access Hive securely with provided guides. == Authentication Currently, the only supported authentication mechanism is Kerberos, which is disabled by default. @@ -17,7 +18,7 @@ The next step is to configure your HdfsCluster to use the newly created SecretCl Please make sure to use the SecretClass named `kerberos`. It is also necessary to configure 2 additional things in HDFS: * Define group mappings for users with `hadoop.user.group.static.mapping.overrides` -* Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any _direct_ access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting `hadoop.proxyuser.hive.users=*` and `hadoop.proxyuser.hive.hosts=*` to allow the user `hive`´to impersonate all other users. +* Tell HDFS that Hive is allowed to impersonate other users, i.e. Hive does not need any _direct_ access permissions for itself, but should be able to impersonate Hive users when accessing HDFS. This can be done by e.g. setting `hadoop.proxyuser.hive.users=*` and `hadoop.proxyuser.hive.hosts=*` to allow the user `hive` to impersonate all other users. An example of the above can be found in this https://github.com/stackabletech/hive-operator/blob/main/tests/templates/kuttl/kerberos-hdfs/30-install-hdfs.yaml.j2[integration test].