Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push_artifact does not push output artifacts to s3 in copy-artifacts step #1292

Open
HumairAK opened this issue Jul 18, 2023 · 3 comments
Open
Labels

Comments

@HumairAK
Copy link
Contributor

/kind bug

What steps did you take and what happened:

When using data passing via kfp.components.OutputPath(), kfp.components.InputPath(), we notice that artifacts never make it in to s3 storage, instead we see a 0kb file named after the output files.

What did you expect to happen:
Output artifacts show up in s3 storage.

Additional information:

When reproducing it please use a separate backing storage for pvc than the s3 solution for apiserver.

Take for example:

  1. Pipeline in kfp dsl
  2. Same Pipeline after it's fed through kfp-tetkon compiler
  3. Same Pipeline after it is adjusted by apiserver, post submit

Once the Pipeline in (1) makes it to api-server (3), we see new task steps added to manage artifact passing/tracking.
The final step added by api server is copy-artifacts, this step pushes the artifacts in this task to s3 storage via the push_artifacts script. The problem we are seeing is that when the artifact is >4kb, this fails.

This step expects the artifact to be in /tekton/home/tep-results, but what you find there is just a file of the artifact output name that is 0kb. This occurs because copy-results-artifacts does not copy the artifact to /tekton/home since it's too big >3072 bytes:

if [ -d /tekton/results ]; then mkdir -p /tekton/home/tep-results; mv /tekton/results/* /tekton/home/tep-results/ || true; fi

this seems to take /tekton/results/ and send it to /tekton/home , from the preceding step copy-results-artifacts we see:

 copy_artifact $(workspaces.produce-output.path)/artifacts/simple-pipeline-fe138/$(context.taskRun.name)/mydestfile $(results.mydestfile.path)

So we're expecting contents in the /workspaceto move to /tekton/results so it can be moved to /tekton/home in the next step.

But when the pipeline is fed through compiler in (2) above, we see that the script in copy-results-artifacts that is added will only move contents of /workspace here, if it's <3072 bytes. (Makes sense because we have to maintain a <4kb to avoid the termination error messages right?)

And since this file is ~20MB that doesn't happen, and instead we end up with the empty file created here instead, and this ends up trickling in to push_artifact here.

We noticed that simply fetching the push_artifact output artifact path arguments from the paths stored in tekton.dev/artifact_items seemed to work, example here. Which could maybe be a trivial change, I'm not sure if it's accounting for everything though.

As a workaround we are looking to using a custom push_artifact script that will look for the artifact in workspaces (if it exists) then push this path to s3.

Environment:

  • SDK Version: 1.5.1
  • Tekton Version (use tkn version): 0.47.x
  • Kubernetes Version (use kubectl version): 1.25
@HumairAK
Copy link
Contributor Author

We notice the same behavior when not using .add_pod_annotation() for pipelines that use data passing. Example.

@gregsheremeta
Copy link
Contributor

We notice the same behavior when not using .add_pod_annotation() for pipelines that use data passing

it's a bit nuanced. What I've seen is:

  • if I have a 2-step pipeline ... step1 with an output, and step2 with an input and and output ->
    • if I leave off the artifact_outputs annotation on step2, step2's artifact gets uploaded to minio, but step2 goes to failure state with the message Error while handling results: Termination message is above max allowed size 4096. Example
    • if I include the artifact_outputs annotation on step2, step2's uploads a 0-byte tgz archive to minio, and step2 shows success. Example

I can't get both a success state and a successful upload at the same time.

@HumairAK
Copy link
Contributor Author

@gregsheremeta this is because, in your example you'll notice that the artifact output gets moved to /tekton/results.

Since push_artifact is pushing everything in /tekton/results (via /tekton/home -> /tekton/results in copy-results-artifacts), thus in this case the artifact will get pushed to s3. But because now we have the /tekton/results containing a file >4kb, we get the termination error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants