Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifact upload to object storage from the last step in a pipeline doesn't work #214

Closed
gregsheremeta opened this issue Jul 12, 2023 · 7 comments · Fixed by #212
Closed
Assignees
Labels
field-priority Identified as a high priority issue by users in the field. kind/bug Something isn't working priority/blocker Critical issue that needs to be fixed asap; blocks up coming releases triage/accepted

Comments

@gregsheremeta
Copy link
Contributor

/kind bug

What steps did you take and what happened:
Returning large artifacts (like models) from the very last step of a pipeline is broken. The automatic upload to the object storage defined in the data connection doesn't work.

What did you expect to happen:
The automatic upload to the object storage defined in the data connection should work.

Additional information:
if I use this DSL https://gist.github.com/gregsheremeta/cfd619f065d0017001e6bfdcd8ca64ae
with add_pod_annotation, i don't get an error, but i get a 0 byte tgz file uploaded.

if I use this DSL https://gist.github.com/gregsheremeta/05e0baa807c052ffba6af8f2b37baf63
without add_pod_annotation, i get an error, but I do have my expected 20MB result in minio.

Environment:

  • Python Version (use python --version):
  • SDK Version: 1.5.3
  • Tekton Version (use tkn version):
  • Kubernetes Version (use kubectl version):
  • OS (e.g. from /etc/os-release):

** Slack threads**
https://kubeflow.slack.com/archives/CVD66E6SJ/p1689027737811769
https://kubeflow.slack.com/archives/CVD66E6SJ/p1689097472913819

@openshift-ci openshift-ci bot added the kind/bug Something isn't working label Jul 12, 2023
@gregsheremeta gregsheremeta self-assigned this Jul 12, 2023
@gregsheremeta gregsheremeta added the field-priority Identified as a high priority issue by users in the field. label Jul 12, 2023
@gregsheremeta
Copy link
Contributor Author

Tommy initially suspects some artifact passing problems with running on OpenShift -- or perhaps the way we have roles or permissions set up in DSP / DSPO.

from https://kubeflow.slack.com/archives/CVD66E6SJ/p1689100848368909?thread_ts=1689097472.913819&cid=CVD66E6SJ

Just want to double check, are you using the kubeflowpipleine-runner service account for your pipeline?
https://github.com/kubeflow/kfp-tekton/blob/master/manifests/kustomize/third-party/openshift/standalone/anyuid-scc.yaml#L38
If not, you probably need to update the service account in your pipeline to have any-uid enabled

@strangiato
Copy link
Contributor

Issue looks similar to the one I submitted in opendatahub-io/data-science-pipelines#106

@HumairAK
Copy link
Contributor

Upstream issue here: kubeflow/kfp-tekton#1292

@HumairAK
Copy link
Contributor

HumairAK commented Jul 18, 2023

As a workaround please try using this patch to the deployed DSPA:

Usage: apply_custom_script.sh <dspa_name> <dspa_namespace>

Let us know if this works. This basically forces the push to s3 process to identify the output artifacts correctly when pushing to s3.

@gregsheremeta
Copy link
Contributor Author

Let us know if [the workaround] works.

it works for me.

in this pipeline, both steps complete successfully, and both steps get the 20mb artifact saved in minio.

"""Test pipeline to exercise various data flow mechanisms."""
import kfp


"""Producer"""
def send_file(
    outgoingfile: kfp.components.OutputPath(),
):
    import urllib.request

    print("starting download...")
    urllib.request.urlretrieve("http://212.183.159.230/20MB.zip", outgoingfile)
    print("done")

"""Consumer"""
def receive_file(
    incomingfile: kfp.components.InputPath(),
    saveartifact: kfp.components.OutputPath(),
):
    import os
    import shutil

    print("reading %s, size is %s" % (incomingfile, os.path.getsize(incomingfile)))

    with open(incomingfile, "rb") as f:
        b = f.read(1)
        print("read byte: %s" % b)
        f.close()
    
    print("copying in %s to out %s" % (incomingfile, saveartifact))
    shutil.copyfile(incomingfile, saveartifact)


"""Build the producer component"""
send_file_op = kfp.components.create_component_from_func(
    send_file,
    base_image="registry.access.redhat.com/ubi8/python-38",
)

"""Build the consumer component"""
receive_file_op = kfp.components.create_component_from_func(
    receive_file,
    base_image="registry.access.redhat.com/ubi8/python-38",
)


"""Wire up the pipeline"""
@kfp.dsl.pipeline(
    name="Test Data Passing Pipeline 1",
)
def wire_up_pipeline():
    import json

    send_file_task = send_file_op()

    receive_file_task = receive_file_op(
        send_file_task.output,
    ).add_pod_annotation(name='artifact_outputs', value=json.dumps(['saveartifact']))

...

in this pipeline, the first step succeeds, but the second step fails with the Error while handling results: Termination message is above max allowed size 4096 message (as expected, since I left off the artifact_outputs annotation). Both steps get the 20mb artifact saved in minio.

"""Test pipeline to exercise various data flow mechanisms."""
import kfp


"""Producer"""
def send_file(
    outgoingfile: kfp.components.OutputPath(),
):
    import urllib.request

    print("starting download...")
    urllib.request.urlretrieve("http://212.183.159.230/20MB.zip", outgoingfile)
    print("done")

"""Consumer"""
def receive_file(
    incomingfile: kfp.components.InputPath(),
    saveartifact: kfp.components.OutputPath(""),
):
    import os
    import shutil

    print("reading %s, size is %s" % (incomingfile, os.path.getsize(incomingfile)))

    with open(incomingfile, "rb") as f:
        b = f.read(1)
        print("read byte: %s" % b)
        f.close()
    
    print("copying in %s to out %s" % (incomingfile, saveartifact))
    shutil.copyfile(incomingfile, saveartifact)


"""Build the producer component"""
send_file_op = kfp.components.create_component_from_func(
    send_file,
    base_image="registry.access.redhat.com/ubi8/python-38",
)

"""Build the consumer component"""
receive_file_op = kfp.components.create_component_from_func(
    receive_file,
    base_image="registry.access.redhat.com/ubi8/python-38",
)


"""Wire up the pipeline"""
@kfp.dsl.pipeline(
    name="Test Data Passing Pipeline 1",
)
def wire_up_pipeline():
    import json

    send_file_task = send_file_op()

    receive_file_task = receive_file_op(
        send_file_task.output,
    )

@gregsheremeta
Copy link
Contributor Author

@HumairAK

As a workaround please try using this patch to the deployed DSPA

since it seems to work, can we make this the official fix in the next DSPO release?

@gregsheremeta
Copy link
Contributor Author

since it seems to work, can we make this the official fix in the next DSPO release?

summarizing slack conversation: yes

and we can carry this fix / don't need to wait for a kfp-tekton backport with a similar fix

@HumairAK HumairAK transferred this issue from opendatahub-io/data-science-pipelines-tekton Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
field-priority Identified as a high priority issue by users in the field. kind/bug Something isn't working priority/blocker Critical issue that needs to be fixed asap; blocks up coming releases triage/accepted
Projects
Status: Pipelines
Development

Successfully merging a pull request may close this issue.

4 participants