feat[frontend]: implement artifact-repositories configmap support #11354

droctothorpe · 2024-11-05T03:17:41Z

Description of your changes:

Argo Workflows supports the ability to specify an artifact repository for archived logs uniquely for each namespace rather than just globally as documented here.

This PR adds support for this functionality to the KFP UI. If a workflow runs and...

The corresponding pods are deleted
The corresponding workflow is deleted
There is an artifact-repositories configmap in the target namespace that provides a unique, namespaced keyFormat

...this PR ensures that logs are still accessible in the UI. This functionality is disabled by default to avoid unnecessary network calls (for end users who don't leverage namespace-specific artifact-repositories). It can be toggled on through configs.ts or an environment variable.

Co-authored with @quinnovator 🌟 .

Fixes: #11339

Checklist:

You have signed off your commits
The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

google-oss-prow · 2024-11-05T03:18:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zijianjoy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

frontend/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: quinnovator <[email protected]>

droctothorpe · 2024-11-06T18:20:56Z

@HumairAK to recreate the edge case:

Deploy KFP to a local K8s cluster.
Set ARGO_ARCHIVE_LOGS=true in configs.ts.
cd into the frontend directory.
Run NAMESPACE=kubeflow npm run start:proxy-and-server to start the UI on your host machine.
Create the following configmap in the kubeflow namespace:

metadata:
  annotations:
    workflows.argoproj.io/default-artifact-repository: artifact-repositories
  name: artifact-repositories
  namespace: kubeflow
data:
  artifact-repositories: |-
    archiveLogs: true
    s3:
      accessKeySecret:
        key: accesskey
        name: mlpipeline-minio-artifact
      bucket: mlpipeline
      endpoint: minio-service.kubeflow:9000
      insecure: true
      keyFormat: foo
      secretKeySecret:
        key: secretkey
        name: mlpipeline-minio-artifact
kind: ConfigMap

Run a simple pipeline.
Once it completes, delete the AWF workflow custom resource associated with the run.
Try to access the logs of the run in the UI running on your host machine. On master, this operation will fail. On this branch, it will succeed.
The next few steps are optional. Port-forward the minio pod.
Navigate to the minio UI.
Login (username: minio, password: minio123).
Note how the logs are stored in mlpipeline/foo rather than the default keyFormat specified in configs.ts. Argo grants precedence to the keyFormat specified in the artifact-repositories configmap over the keyFormat specified in workflow-controller-configmap.

droctothorpe · 2024-11-08T02:19:47Z

I updated the instructions to reproduce to edge case to make manual validation a lot easier by running the UI on your host machine.

HumairAK · 2024-11-13T23:19:42Z

Hey @droctothorpe /@quinnovator, thanks for this. I'm trying this out now and encountering some hiccups. I suspect it might be my environment/setup, but I want to confirm with you first.

When buildling the image and trying this out with a pod deployment of the UI, I'm not seeing this working, what I see:

the UI yaml is as posted below:

deployment.yaml

kind: Deployment
apiVersion: apps/v1
metadata:
  annotations:
    deployment.kubernetes.io/revision: '3'
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"ml-pipeline-ui","application-crd-id":"kubeflow-pipelines"},"name":"ml-pipeline-ui","namespace":"kubeflow"},"spec":{"selector":{"matchLabels":{"app":"ml-pipeline-ui","application-crd-id":"kubeflow-pipelines"}},"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict":"true"},"labels":{"app":"ml-pipeline-ui","application-crd-id":"kubeflow-pipelines"}},"spec":{"containers":[{"env":[{"name":"DISABLE_GKE_METADATA","value":"true"},{"name":"VIEWER_TENSORBOARD_POD_TEMPLATE_SPEC_PATH","value":"/etc/config/viewer-pod-template.json"},{"name":"MINIO_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"MINIO_ACCESS_KEY","valueFrom":{"secretKeyRef":{"key":"accesskey","name":"mlpipeline-minio-artifact"}}},{"name":"MINIO_SECRET_KEY","valueFrom":{"secretKeyRef":{"key":"secretkey","name":"mlpipeline-minio-artifact"}}},{"name":"ALLOW_CUSTOM_VISUALIZATIONS","value":"true"},{"name":"FRONTEND_SERVER_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}}],"image":"gcr.io/ml-pipeline/frontend:2.3.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"exec":{"command":["wget","-q","-S","-O","-","http://localhost:3000/apis/v1beta1/healthz"]},"initialDelaySeconds":3,"periodSeconds":5,"timeoutSeconds":2},"name":"ml-pipeline-ui","ports":[{"containerPort":3000}],"readinessProbe":{"exec":{"command":["wget","-q","-S","-O","-","http://localhost:3000/apis/v1beta1/healthz"]},"initialDelaySeconds":3,"periodSeconds":5,"timeoutSeconds":2},"resources":{"requests":{"cpu":"10m","memory":"70Mi"}},"volumeMounts":[{"mountPath":"/etc/config","name":"config-volume","readOnly":true}]}],"serviceAccountName":"ml-pipeline-ui","volumes":[{"configMap":{"name":"ml-pipeline-ui-configmap"},"name":"config-volume"}]}}}}
  resourceVersion: '627852'
  name: ml-pipeline-ui
  uid: 4d454d73-05f6-4b01-b744-8a4c9aec05ec
  creationTimestamp: '2024-11-12T22:37:37Z'
  generation: 8
  namespace: kubeflow
  labels:
    app: ml-pipeline-ui
    application-crd-id: kubeflow-pipelines
spec:
  replicas: 0
  selector:
    matchLabels:
      app: ml-pipeline-ui
      application-crd-id: kubeflow-pipelines
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ml-pipeline-ui
        application-crd-id: kubeflow-pipelines
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
    spec:
      restartPolicy: Always
      serviceAccountName: ml-pipeline-ui
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      securityContext: {}
      containers:
        - resources:
            requests:
              cpu: 10m
              memory: 70Mi
          readinessProbe:
            exec:
              command:
                - wget
                - '-q'
                - '-S'
                - '-O'
                - '-'
                - 'http://localhost:3000/apis/v1beta1/healthz'
            initialDelaySeconds: 3
            timeoutSeconds: 2
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: ml-pipeline-ui
          livenessProbe:
            exec:
              command:
                - wget
                - '-q'
                - '-S'
                - '-O'
                - '-'
                - 'http://localhost:3000/apis/v1beta1/healthz'
            initialDelaySeconds: 3
            timeoutSeconds: 2
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          env:
            - name: DISABLE_GKE_METADATA
              value: 'true'
            - name: VIEWER_TENSORBOARD_POD_TEMPLATE_SPEC_PATH
              value: /etc/config/viewer-pod-template.json
            - name: MINIO_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: MINIO_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: mlpipeline-minio-artifact
                  key: accesskey
            - name: MINIO_SECRET_KEY
              valueFrom:
                secretKeyRef:
                  name: mlpipeline-minio-artifact
                  key: secretkey
            - name: ALLOW_CUSTOM_VISUALIZATIONS
              value: 'true'
            - name: FRONTEND_SERVER_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: ARGO_ARTIFACT_REPOSITORIES_LOOKUP
              value: 'true'
            - name: ARGO_ARCHIVE_LOGS
              value: 'true'
          ports:
            - containerPort: 3000
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: config-volume
              readOnly: true
              mountPath: /etc/config
          terminationMessagePolicy: File
          image: 'quay.io/hukhan/ds-pipelines-frontend:11354'
      serviceAccount: ml-pipeline-ui
      volumes:
        - name: config-volume
          configMap:
            name: ml-pipeline-ui-configmap
            defaultMode: 420
      dnsPolicy: ClusterFirst
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
status:
  observedGeneration: 8
  conditions:
    - type: Progressing
      status: 'True'
      lastUpdateTime: '2024-11-13T04:36:07Z'
      lastTransitionTime: '2024-11-12T22:37:37Z'
      reason: NewReplicaSetAvailable
      message: ReplicaSet "ml-pipeline-ui-5c5479dc67" has successfully progressed.
    - type: Available
      status: 'True'
      lastUpdateTime: '2024-11-13T22:54:29Z'
      lastTransitionTime: '2024-11-13T22:54:29Z'
      reason: MinimumReplicasAvailable
      message: Deployment has minimum availability.

And the artifact repository configmap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: artifact-repositories
  namespace: kubeflow
data:
  artifact-repositories: |-
    archiveLogs: true
    s3:
      accessKeySecret:
        key: accesskey
        name: mlpipeline-minio-artifact
      bucket: mlpipeline
      endpoint: minio-service.kubeflow:9000
      insecure: true
      keyFormat: custom_location/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}
      secretKeySecret:
        key: secretkey
        name: mlpipeline-minio-artifact

As you can see I have set ARGO_ARTIFACT_REPOSITORIES_LOOKUP and ARGO_ARCHIVE_LOGS to true. I can also confirm it's got the PR changes by viewing the UI pod container logs, the UI config has:

{
argo: {
archiveArtifactory: 'minio',
archiveBucketName: 'mlpipeline',
archiveLogs: true,
keyFormat: 'artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}',
artifactRepositoriesLookup: true      

^^^^^ PR change

When trying this locally, I see some strange behavior where one of the task nodes is fetching the logs from object store, but not the other task. In some scenarios both fail to fetch it. My suspicion is there is some weird timeout behavior that's faulty with connecting to object store on my side, but wondering if you encounter this behavior at all:

The pipeline used:

pipeline.yaml

# PIPELINE DEFINITION
# Name: tutorial-data-passing-2
# Inputs:
#    message: str [Default: 'message']
components:
  comp-preprocess:
    executorLabel: exec-preprocess
    inputDefinitions:
      parameters:
        message:
          parameterType: STRING
    outputDefinitions:
      artifacts:
        output_dataset_one:
          artifactType:
            schemaTitle: system.Dataset
            schemaVersion: 0.0.1
        output_dataset_two_path:
          artifactType:
            schemaTitle: system.Dataset
            schemaVersion: 0.0.1
      parameters:
        output_bool_parameter_path:
          parameterType: BOOLEAN
        output_dict_parameter_path:
          parameterType: STRUCT
        output_list_parameter_path:
          parameterType: LIST
        output_parameter_path:
          parameterType: STRING
  comp-train:
    executorLabel: exec-train
    inputDefinitions:
      artifacts:
        dataset_one_path:
          artifactType:
            schemaTitle: system.Dataset
            schemaVersion: 0.0.1
        dataset_two:
          artifactType:
            schemaTitle: system.Dataset
            schemaVersion: 0.0.1
      parameters:
        input_bool:
          parameterType: BOOLEAN
        input_dict:
          parameterType: STRUCT
        input_list:
          parameterType: LIST
        message:
          parameterType: STRING
        num_steps:
          defaultValue: 100.0
          isOptional: true
          parameterType: NUMBER_INTEGER
    outputDefinitions:
      artifacts:
        model:
          artifactType:
            schemaTitle: system.Model
            schemaVersion: 0.0.1
deploymentSpec:
  executors:
    exec-preprocess:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - preprocess
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.3.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef preprocess(\n    # An input parameter of type string.\n    message:\
          \ str,\n    # Use Output[T] to get a metadata-rich handle to the output\
          \ artifact\n    # of type `Dataset`.\n    output_dataset_one: Output[Dataset],\n\
          \    # A locally accessible filepath for another output artifact of type\n\
          \    # `Dataset`.\n    output_dataset_two_path: OutputPath('Dataset'),\n\
          \    # A locally accessible filepath for an output parameter of type string.\n\
          \    output_parameter_path: OutputPath(str),\n    # A locally accessible\
          \ filepath for an output parameter of type bool.\n    output_bool_parameter_path:\
          \ OutputPath(bool),\n    # A locally accessible filepath for an output parameter\
          \ of type dict.\n    output_dict_parameter_path: OutputPath(Dict[str, int]),\n\
          \    # A locally accessible filepath for an output parameter of type list.\n\
          \    output_list_parameter_path: OutputPath(List[str]),\n):\n    \"\"\"\
          Dummy preprocessing step.\"\"\"\n\n    # Use Dataset.path to access a local\
          \ file path for writing.\n    # One can also use Dataset.uri to access the\
          \ actual URI file path.\n    with open(output_dataset_one.path, 'w') as\
          \ f:\n        f.write(message)\n\n    # OutputPath is used to just pass\
          \ the local file path of the output artifact\n    # to the function.\n \
          \   with open(output_dataset_two_path, 'w') as f:\n        f.write(message)\n\
          \n    with open(output_parameter_path, 'w') as f:\n        f.write(message)\n\
          \n    with open(output_bool_parameter_path, 'w') as f:\n        f.write(\n\
          \            str(True))  # use either `str()` or `json.dumps()` for bool\
          \ values.\n\n    import json\n    with open(output_dict_parameter_path,\
          \ 'w') as f:\n        f.write(json.dumps({'A': 1, 'B': 2}))\n\n    with\
          \ open(output_list_parameter_path, 'w') as f:\n        f.write(json.dumps(['a',\
          \ 'b', 'c']))\n\n"
        image: quay.io/opendatahub/ds-pipelines-ci-executor-image:v1.0
    exec-train:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - train
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
          \ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.3.0'\
          \ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
          $0\" \"$@\"\n"
        - sh
        - -ec
        - 'program_path=$(mktemp -d)


          printf "%s" "$0" > "$program_path/ephemeral_component.py"

          _KFP_RUNTIME=true python3 -m kfp.dsl.executor_main                         --component_module_path                         "$program_path/ephemeral_component.py"                         "$@"

          '
        - "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
          \ *\n\ndef train(\n    # Use InputPath to get a locally accessible path\
          \ for the input artifact\n    # of type `Dataset`.\n    dataset_one_path:\
          \ InputPath('Dataset'),\n    # Use Input[T] to get a metadata-rich handle\
          \ to the input artifact\n    # of type `Dataset`.\n    dataset_two: Input[Dataset],\n\
          \    # An input parameter of type string.\n    message: str,\n    # Use\
          \ Output[T] to get a metadata-rich handle to the output artifact\n    #\
          \ of type `Model`.\n    model: Output[Model],\n    # An input parameter\
          \ of type bool.\n    input_bool: bool,\n    # An input parameter of type\
          \ dict.\n    input_dict: Dict[str, int],\n    # An input parameter of type\
          \ List[str].\n    input_list: List[str],\n    # An input parameter of type\
          \ int with a default value.\n    num_steps: int = 100,\n):\n    \"\"\"Dummy\
          \ Training step.\"\"\"\n    with open(dataset_one_path, 'r') as input_file:\n\
          \        dataset_one_contents = input_file.read()\n\n    with open(dataset_two.path,\
          \ 'r') as input_file:\n        dataset_two_contents = input_file.read()\n\
          \    print(\"test3\")\n    line = (f'dataset_one_contents: {dataset_one_contents}\
          \ || '\n            f'dataset_two_contents: {dataset_two_contents} || '\n\
          \            f'message: {message} || '\n            f'input_bool: {input_bool},\
          \ type {type(input_bool)} || '\n            f'input_dict: {input_dict},\
          \ type {type(input_dict)} || '\n            f'input_list: {input_list},\
          \ type {type(input_list)} \\n')\n\n    with open(model.path, 'w') as output_file:\n\
          \        for i in range(num_steps):\n            output_file.write('Step\
          \ {}\\n{}\\n=====\\n'.format(i, line))\n\n    # model is an instance of\
          \ Model artifact, which has a .metadata dictionary\n    # to store arbitrary\
          \ metadata for the output artifact.\n    model.metadata['accuracy'] = 0.9\n\
          \n"
        image: quay.io/opendatahub/ds-pipelines-ci-executor-image:v1.0
pipelineInfo:
  name: tutorial-data-passing-2
root:
  dag:
    tasks:
      preprocess:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-preprocess
        inputs:
          parameters:
            message:
              componentInputParameter: message
        taskInfo:
          name: preprocess
      train:
        cachingOptions:
          enableCache: true
        componentRef:
          name: comp-train
        dependentTasks:
        - preprocess
        inputs:
          artifacts:
            dataset_one_path:
              taskOutputArtifact:
                outputArtifactKey: output_dataset_one
                producerTask: preprocess
            dataset_two:
              taskOutputArtifact:
                outputArtifactKey: output_dataset_two_path
                producerTask: preprocess
          parameters:
            input_bool:
              taskOutputParameter:
                outputParameterKey: output_bool_parameter_path
                producerTask: preprocess
            input_dict:
              taskOutputParameter:
                outputParameterKey: output_dict_parameter_path
                producerTask: preprocess
            input_list:
              taskOutputParameter:
                outputParameterKey: output_list_parameter_path
                producerTask: preprocess
            message:
              taskOutputParameter:
                outputParameterKey: output_parameter_path
                producerTask: preprocess
        taskInfo:
          name: train
  inputDefinitions:
    parameters:
      message:
        defaultValue: message
        isOptional: true
        parameterType: STRING
schemaVersion: 2.1.0
sdkVersion: kfp-2.3.0

So question for you:

Did you have this working with a built image and running in a pod? If so, I suspect I must have something configured incorrectly then.
Does the above pipeline work fine for you when you juggle between either task's pod logs (after deleting the underlying WF) ? (or any other pipeline with more than one task node with logs)

droctothorpe · 2024-11-14T16:52:37Z

Thank you so much for taking the time to validate this, @HumairAK, and for being so thorough with documenting the issue that you encountered.

'm going to try a couple of things to see if I can repro your results:

Running the UI on host (using NAMESPACE=kubeflow npm run start:proxy-and-server), validate that I can toggle between multiple tasks.
Running the UI on host, use your pipeline.
Same as above but also use your configmap.
Same as above but build and deploy the UI image instead of running it on the host.

Results:

Running the UI on host, validate that I can toggle between multiple tasks.
Worked.

Running the UI on host, use your pipeline.
Worked.

Same as above but also use your configmap.

I think I see the problem. I did a comparison of the two configmaps:

Your configmap is missing the workflows.argoproj.io/default-artifact-repository: artifact-repositories annotation. From the AWF documentation:

Can you run the same test but make sure to include that annotation in your configmap?

Thank you! 🙏

google-oss-prow bot added the size/L label Nov 5, 2024

google-oss-prow bot requested review from HumairAK and zijianjoy November 5, 2024 03:18

feat[frontend]: implement configmap parsing

b373f18

Signed-off-by: droctothorpe <[email protected]> Co-authored-by: quinnovator <[email protected]>

droctothorpe force-pushed the configmap branch from 4a25f6a to b373f18 Compare November 5, 2024 03:26

github-actions bot added ci-passed All CI tests on a pull request have passed and removed ci-passed All CI tests on a pull request have passed labels Nov 5, 2024

VaniHaripriya mentioned this pull request Nov 5, 2024

test: Fix ci-passed label update on PR synchronization #11358

Merged

2 tasks

github-actions bot added the ci-passed All CI tests on a pull request have passed label Nov 5, 2024

HumairAK self-assigned this Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[frontend]: implement artifact-repositories configmap support #11354

feat[frontend]: implement artifact-repositories configmap support #11354

droctothorpe commented Nov 5, 2024 •

edited

Loading

google-oss-prow bot commented Nov 5, 2024

droctothorpe commented Nov 6, 2024 •

edited

Loading

droctothorpe commented Nov 8, 2024 •

edited

Loading

HumairAK commented Nov 13, 2024 •

edited

Loading

droctothorpe commented Nov 14, 2024

feat[frontend]: implement artifact-repositories configmap support #11354

Are you sure you want to change the base?

feat[frontend]: implement artifact-repositories configmap support #11354

Conversation

droctothorpe commented Nov 5, 2024 • edited Loading

google-oss-prow bot commented Nov 5, 2024

droctothorpe commented Nov 6, 2024 • edited Loading

droctothorpe commented Nov 8, 2024 • edited Loading

HumairAK commented Nov 13, 2024 • edited Loading

droctothorpe commented Nov 14, 2024

Results:

droctothorpe commented Nov 5, 2024 •

edited

Loading

droctothorpe commented Nov 6, 2024 •

edited

Loading

droctothorpe commented Nov 8, 2024 •

edited

Loading

HumairAK commented Nov 13, 2024 •

edited

Loading