-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix step logging when using GCS Artifact Store #2211
Comments
Hello @strickvl, I'm trying to reproduce this issue but can't. I made a GCS bucket and tried to run the first snippet and got the following error. Please let me know if you need the traceback.
The error was raised for the following line,
|
Here I'd patch in @bcdurak who I think was most involved with that particular part of the codebase. I think he should be able to help with this. Other things to check:
import gcsfs
fs = gcsfs.GCSFileSystem()
with fs.open('gs://your-bucket-name/test.txt', 'w') as f:
f.write('Hello, world!')
with fs.open('gs://your-bucket-name/test.txt', 'r') as f:
print(f.read()) (Replace 'gs://your-bucket-name/test.txt' with a valid path in your GCS bucket.) |
I think I see what's going on now. Are you running the code with a GCS artifact store configured in your ZenML stack? ( |
I see. I tried to setup a GCS artifact store but am facing some errors. I don't understand a few steps and will first acquaint myself. Could you please assign me to this issue? |
I was able to reproduce the issue. The output I get for the initial code is
I will now work on solving the issue. |
@strickvl I have fixed the issue locally and I'm getting the expected output as shown below However I'm facing an issue in following the Contributions guidelines. While running the command Also, while opening a pull request, I read this pre-requisite: |
For our cloud integrations, it's enough to demonstrate that you've tested it. We don't currently run integration tests on cloud environments, so basically for something like this it wouldn't be possible to test it locally. Icing on the cake would be to include instructions how someone from the core team could reproduce your local test (code snippet and reminder of what the stack setup would be) in the PR, but beyond that I think you're ok. Also for mypy I think you can ignore that and just make the PR. Any issues will be revealed there. |
Open Source Contributors Welcomed!
Please comment below if you would like to work on this issue!
Contact Details [Optional]
[email protected]
What happened?
There seems to be an issue with StepLogging when using GCS (Google Cloud Storage) as the artifact store. Specifically, only the last parts of the logs appear in the file, which suggests a problem with the log writing or saving mechanism.
Steps to Reproduce
Here's a snippet to reproduce the issue:
Expected Behavior
All log lines should be saved and visible in the GCS file, not just the last few.
Potential Solution
Consider using the logging.StreamHandler facility to temporarily write logs to the remote file (GCS, S3, etc.). Here's an example:
This approach could fit nicely in the
StepLogsStorageContext
class.Additional Context
Proper log handling is crucial for debugging and monitoring pipeline performance, especially when dealing with large-scale data processing in cloud environments.
Code of Conduct
The text was updated successfully, but these errors were encountered: