Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addon Hive-operator #655

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions examples/hive-operator/hive-cluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: hive-postgres-cluster
spec:
components:
- type: hive-cluster
name: hive-postgres-cluster
properties:
image:
productVersion: 3.1.3
stackableVersion: 23.1.0
clusterConfig:
database:
connString: jdbc:postgresql://postgresql:5432/hive
user: hive
password: hive
dbType: postgres
s3:
reference: minio
metastore:
roleGroups:
default:
replicas: 1
14 changes: 14 additions & 0 deletions examples/hive-operator/s3-connection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: s3-connection-sample
spec:
components:
- type: s3-connection
name: minio
properties:
host: minio
port: 9000
accessStyle: Path
credentials:
secretClass: hive-s3-secret-class
13 changes: 13 additions & 0 deletions examples/hive-operator/secret-class.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: secret-class-sample
spec:
components:
- type: secret-class
name: hive-s3-secret-class
properties:
backend:
k8sSearch:
searchNamespace:
pod: {}
175 changes: 175 additions & 0 deletions experimental/addons/hive-operator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# hive-operator

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Hive Metastore(HMS) provides a central repository of metadata that can easily be analyzed to make informed, data driven decisions, and therefore it is a critical component of many data lake architectures. Hive is built on top of Apache Hadoop and supports storage on S3, adls, gs etc though hdfs. Hive allows users to read, write, and manage petabytes of data using SQL.

This is an operator for Kubernetes that can manage Apache Hive. Currently, it only supports the Hive Metastore!

## Install Operator

Add experimental registry
```
vela addon registry add experimental --type=helm --endpoint=https://addons.kubevela.net/experimental/
```

Enable this addon
```
vela addon enable hive-operator
```

```shell
$ vela ls -A | grep hive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try vela status?

vela-system addon-hive-operator ns-hive-operator k8s-objects running healthy
vela-system └─ hive-operator helm running healthy Fetch repository successfully, Create helm release
```

Disable this addon
```
vela addon disable hive-operator
```

## Install Dependencies

In order to install the MinIO and PostgreSQL dependencies via Helm, you have to deploy two charts.

**Minio**

```shell
helm install minio \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can we use vela application or leverage addon for this installation?
  2. the further question is: if I were the user, I have already learned helm install, why I don't use helm install to install hive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended addon capabililty to install dependencies at the time of addon installation.

Copy link
Member

@charlie0129 charlie0129 Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I don't think this conversation is resolved, yet. What @wonderflow means is this addon needs to have its distinct advantages that will benfit users compared to helm, otherwise the user would chose helm instead. Not just on the first hands-on, but also on subsequent deployments. I don't have a good knowledge about Hive, but here are some possible improvements, which are just my thoughts on this particular issue.

For example, vela Definitions provide an easy way to spin up complex things. In this particular addon, you have 3 components hive-cluster, s3-connection and secret-class that must be applied in order to have a functional hive cluster. Can this process be simplified? For example, can the user spin up a hive cluster with just one well-designed component? That well-designed component creates several resources under the hood, abstracts away tedious details from user while maintaining a certain level of customizability. And this is what makes vela or vela addons apart from typical helm installation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlie0129 for explanation. And it makes sence to have a simplified installation process. I think I should create a new component that create all hive related resources at once and make hive-cluster running in the cluster.

--namespace prod \
--version 4.0.2 \
--set mode=standalone \
--set replicas=1 \
--set persistence.enabled=false \
--set buckets[0].name=hive,buckets[0].policy=none \
--set users[0].accessKey=hive,users[0].secretKey=hivehive,users[0].policy=readwrite \
--set resources.requests.memory=1Gi \
--set service.type=NodePort,service.nodePort=null \
--set consoleService.type=NodePort,consoleService.nodePort=null \
--repo https://charts.min.io/ minio
```

**PostgresSQL**

```shell
helm install postgresql \
--version=12.1.5 \
--namespace prod \
--set postgresqlUsername=hive \
--set postgresqlPassword=hive \
--set postgresqlDatabase=hive \
--repo https://charts.bitnami.com/bitnami postgresql
```

## Use

After going through the Installation section and having installed all the dependencies, you will now deploy a Hive metastore cluster and it’s dependencies. Afterwards you can verify that it works.

In order to connect Hive to MinIO we need to create several dependent components like S3Connection, Secret and a SecretClass

**s3-connection**

An S3Connection to connect to MinIO, Apply below YAML:

```yaml
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: s3-connection-sample
spec:
components:
- type: s3-connection
name: minio
properties:
host: minio
port: 9000
accessStyle: Path
credentials:
secretClass: hive-s3-secret-class
Copy link
Member

@charlie0129 charlie0129 Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put all these (including the single-component Applications below) into one single Application that the user can run and test? If used in this way, it is essentially the same as applying a bunch of yamls. The benfit of vela applications are gone.

```

**secret**

Credentials for the S3Connection to log into MinIO, Apply below YAML:

```yaml
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: hive-secret-sample
spec:
components:
- type: k8s-objects
name: k8s-demo-secret
properties:
objects:
- apiVersion: v1
kind: Secret
metadata:
name: hive-s3-secret
labels:
secrets.stackable.tech/class: hive-s3-secret-class
stringData:
accessKey: hive
secretKey: hivehive
```

**secret-class**

A SecretClass for the credentials to the Minio. The credentials were defined in the installation of Minio via helm, Apply below YAML:

```yaml
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: secret-class-sample
spec:
components:
- type: secret-class
name: hive-s3-secret-class
properties:
backend:
k8sSearch:
searchNamespace:
pod: {}
```

**hive-cluster**

And lastly the actual Apache Hive cluster definition, Apply below YAML:

```yaml
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: hive-postgres-cluster
spec:
components:
- type: hive-cluster
name: hive-postgres-cluster
properties:
image:
productVersion: 3.1.3
stackableVersion: 23.1.0
clusterConfig:
database:
connString: jdbc:postgresql://postgresql:5432/hive
user: hive
password: hive
dbType: postgres
s3:
reference: minio
metastore:
roleGroups:
default:
replicas: 1
```

Verify that it works

```shell
$ kubectl get statefulset -n prod
MdSahil-oss marked this conversation as resolved.
Show resolved Hide resolved
NAME READY AGE
hive-postgres-cluster-metastore-default 1/1 76s
```

For more visit on the website https://docs.stackable.tech/home/stable/hive/index.html.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide some information here about how to use the hive cluster.

77 changes: 77 additions & 0 deletions experimental/addons/hive-operator/definitions/hive-cluster.cue
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"hive-cluster": {
alias: ""
annotations: {}
attributes: workload: type: "autodetects.core.oam.dev"
description: "s3 bucket component"
labels: {}
type: "component"
}

template: {
output: {
kind: "HiveCluster"
apiVersion: "hive.stackable.tech/v1alpha1"
metadata: {
name: context.name
}
spec: {
image: parameter.image
clusterConfig: parameter.clusterConfig
metastore: parameter.metastore
}
}
parameter: {
//+usage=The Hive metastore image to use.
image: {
//+usage=Overwrite the docker image. Specify the full docker image name, e.g. `docker.stackable.tech/stackable/superset:1.4.1-stackable2.1.0
custom: *null | string
//+usage=Version of the product, e.g. `1.4.1`.
productVersion: *null | string
//+usage=Pull policy used when pulling the Images.
pullPolicy: *"IfNotPresent" | string
//+usage=Image pull secrets to pull images from a private registry.
pullSecrets: *null | [...]
//+usage=Name of the docker repo, e.g. `docker.stackable.tech/stackable.
repo: *null | string
//+usage=Stackable version of the product, e.g. 2.1.0.
stackableVersion: *null | string
}
//+usage=General Hive metastore cluster settings.
clusterConfig: {
//+usage=Database connection specification.
database: {
connString: *null | string
dbType: *null | string
password: *null | string
user: *null | string
}
//+usage=HDFS connection specification.
hdfs: *null | {...}
//+usage=S3 connection specification.
s3: {
//+usage=S3 connection definition as CRD.
inline: *null | {...}
reference: *null | string
}
//+usage=Specify the type of the created kubernetes service. This attribute will be removed in a future release when listener-operator is finished. Use with caution.
serviceType: *"ClusterIP" | string
//+usage=Name of the Vector aggregator discovery ConfigMap. It must contain the key `ADDRESS` with the address of the Vector aggregator.
vectorAggregatorConfigMapName: *null | string
}
//+usage=Configure metastore.
metastore: {
//+usage=Name of the discovery-configmap providing information about the HDFS cluster.
cliOverrides: *{} | {...}
//+usage=Name of the discovery-configmap providing information about the HDFS cluster.
config: *{} | {...}
//+usage=Name of the discovery-configmap providing information about the HDFS cluster.
configOverrides: *{} | {...}
//+usage=Name of the discovery-configmap providing information about the HDFS cluster.
envOverrides: *{} | {...}
//+usage=Name of the discovery-configmap providing information about the HDFS cluster.
roleGroups: *{} | {...}
}
//+usage=Emergency stop button, if `true` then all pods are stopped without affecting configuration (as setting `replicas` to `0` would.
stopped: *null | bool
}
}
33 changes: 33 additions & 0 deletions experimental/addons/hive-operator/definitions/s3-bucket.cue
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"s3-bucket": {
alias: ""
annotations: {}
attributes: workload: type: "autodetects.core.oam.dev"
description: "s3 bucket component"
labels: {}
type: "component"
}

template: {
output: {
kind: "S3Bucket"
apiVersion: "s3.stackable.tech/v1alpha1"
metadata: {
name: context.name
}
spec: {
bucketName: parameter.bucketName
connection: parameter.connection
}
}
parameter: {
//+usage=the name of the Bucket.
bucketName: *null | string
//+usage=can either be inline or reference.
connection: {
//+usage=the name of the Bucket.
inline: *null | {...}
//+usage=the name of the referenced S3Connection resource, which must be in the same namespace as the S3Bucket resource.
reference: *null | string
}
}
}
37 changes: 37 additions & 0 deletions experimental/addons/hive-operator/definitions/s3-connection.cue
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"s3-connection": {
alias: ""
annotations: {}
attributes: workload: type: "autodetects.core.oam.dev"
description: "s3 connection component"
labels: {}
type: "component"
}

template: {
output: {
kind: "S3Connection"
apiVersion: "s3.stackable.tech/v1alpha1"
metadata: {
name: context.name
}
spec: {
host: parameter.host
port: parameter.port
accessStyle: parameter.accessStyle
credentials: parameter.credentials
}
}
parameter: {
//+usage=the domain name of the host of the object store, such as s3.west.provider.com.
host: *null | string
//+usage=a port such as 80 or 4242.
port: *null | int
//+usage=Optional. Can be either "VirtualHosted" (default) or "Path".
accessStyle: *"VirtualHosted" | string
//+usage=contains a secretClass.
credentials: {
//+usage=a reference to a SecretClass resource
secretClass: *null | string
}
}
}
Loading