Skip to content

Latest commit

 

History

History
226 lines (143 loc) · 6.96 KB

README.md

File metadata and controls

226 lines (143 loc) · 6.96 KB

Topo Workflows

Topo workflows are run on a AWS EKS Cluster using Argo Workflows. The detailed configuration is available in this repo.

To get setup you need access to the Argo user role inside the EKS cluster, you will need to contact someone from Topo Data Engineering to get access, all Imagery maintainers will already have access.

If creating your own workflow, or interested in the details of a current workflow please also read the CONFIGURATION.md.

Setup

You will need

Ensure you have kubectl aliased to k

alias k=kubectl

To connect to the EKS cluster you need to be logged into AWS

aws-azure-login

Then to setup the cluster, only the first time using the cluster you need to run this

aws --region=ap-southeast-2 eks update-kubeconfig --name=Workflows

to validate the cluster is connected,

k get nodes

NAME                                               STATUS   ROLES    AGE    VERSION
ip-255-100-38-100.ap-southeast-2.compute.internal   Ready    <none>   7d   v1.21.12-eks-5308cf7
ip-255-100-39-100.ap-southeast-2.compute.internal   Ready    <none>   7d   v1.21.12-eks-5308cf7

to make the cli access easier you can set the default namespace to argo

k config set-context --current --namespace=argo

Submitting a job

Once the cluster connection is setup a job can be submitted with the cli or accessed via the running argo-server

argo submit --watch workflows/raster/standardising.yaml

To open the web interface:

# Create a connection to the Argo server
k port-forward deployment/argo-workflows-server 2746:2746

xdg-open http://localhost:2746

Submit a Job Using the Argo UI

In the Workflows page:

  1. SUBMIT NEW WORKFLOW
  2. Edit using full workflow options
  3. UPLOAD FILE
  4. (Locate File -> Open)
  5. + CREATE

Debugging Argo Workflows

Workflow Parameters

WorkflowParameters

Workflow Logs

WorkflowLogs

Logs in Elasticsearch

Elasticsearch is an analytics engine, it allows us to store, search and analyse AWS logs.
Elasticsearch can be accessed through https://myapplications.microsoft.com/.

Example Filters:

⚠️ Make sure you are using the workflow data view and set the correct time filter.

All Logs for a Workflow:

kubernetes.labels.workflows.argoproj.io/workflow : "imagery-standardising-v0.2.0-60-9b7dq"

All Logs for a pod:
Click on the pod in the Argo UI and scroll through the summary table to find the pod name.

kubernetes.annotations.workflows.argoproj.io/node-name.keyword : "imagery-standardising-v0.2.0-60-9b7dq.create-config"

List Failed Stac Validation Logs:

kubernetes.labels.workflows.argoproj.io/workflow : "imagery-standardising-v0.2.0-60-9b7dq" and data.valid : False

Find a Basemaps URL:

kubernetes.labels.workflows.argoproj.io/workflow : "imagery-standardising-v0.2.0-60-9b7dq" and data.url : *

or

data.title : "Wellington Urban Aerial Photos (1987-1988) SN8790" and data.url : *

Container version used

kubernetes.container_hash field, available in Elasticsearch, gives the container hash that was used to run the task. It allows to get the version from the container registry for further investigations.

Workflow Artifacts

All workflow outputs and logs are stored in the artifacts bucket, in the linz-workflow-artifacts bucket on the li-topo-prod account.

All outputs follow the same naming convention:

s3://linz-workflow-artifacts/YYYY-mm/dd-workflow.name/pod.name/

For each pod the logs are saved as a main.log file within the related pod.name prefix.

Unless a different location is specified within the workflow code, output files will be uploaded to the corresponding pod.name prefix.

Note: This bucket has a 90 day expiration lifecycle.

Connecting to a Container

List pods:

k get pods --namespace=argo
# note: if the default namespace is set to argo, `--namespace=argo` is not required.

In the output next to the NAME of the pod, the READY column indicates how many Docker containers are running inside the pod. For example, 1/1 indicates there is one Docker container.

The output of the follow command includes a Containers section. The first line in this section is the container name, for example, argo-server.

k describe pods *pod_name* --namespace=argo

To access a container in a pod run:

k exec --namespace=argo --stdin=true --tty=true *pod_name* -- bash

Once inside the container you can run a number of commands. For example, if trouble shooting network issues, you could run the following:

mtr linz-workflow-artifacts.s3.ap-southeast-2.amazonaws.com
mtr sts.ap-southeast-2.amazonaws.com
watch --errexit nslookup linz-workflow-artifacts.s3.ap-southeast-2.amazonaws.com

Concurrency

See Concurrency for details on how to set limits on how many workflow instances can be run concurrently.

FAQ

error: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1alpha1"

Upgrade aws cli to > 2.7.x

Using containers

Some tasks in the Workflows or WorkflowsTemplates use a container to run from. These containers are build from other repository, such as https://github.com/linz/topo-imagery, https://github.com/linz/argo-tasks or https://github.com/linz/basemaps. Different tags are published for each of these containers:

  • latest
  • vX.Y.Z
  • vX.Y
  • vX

The container version are managed by a workflow parameter that needs to be specified when submitting the workflow. The default value is the last major version of the container. Using the major version tag (vX) with imagePullPolicy: Always ensures that all minor versions are included when running a workflow using these containers.

:latest

This tag should never be used in production as it points to the latest build of the container which could be an unstable version. We reserve this tag for testing purposes.

:vX.Y.Z, :vX.Y, :vX

These tags are intended to be use in production as they will be published for each stable release of the container.

  • :vX.Y will change dynamically as Z will be incremented.
  • :vX will change dynamically as Y and Z will be incremented.

Using different versions

For each Workflow and WorkflowTemplate, there is a parameter version_* that allows to specify the version of the LINZ container to use.