-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable virtual environment execution for docker - poc #17163
base: main
Are you sure you want to change the base?
Enable virtual environment execution for docker - poc #17163
Conversation
CodSpeed Performance ReportMerging #17163 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nparley! Thanks for raising your issues with the scope of prefect
's dependencies and putting together a POC for a solution!
You're 100% correct that prefect-client
is sufficient for executing scheduled flow runs if you're running a separate Prefect server or using Prefect Cloud. I think we could improve this situation by publishing a prefect-client
image alongside our currently published images.
I can look into creating a more lightweight Docker image for broader consumption, but if you want to build your own version, a Dockerfile like this should work:
FROM python:3.11-slim
WORKDIR /opt/prefect
RUN apt-get update && \
apt-get install --no-install-recommends -y \
gpg \
git=1:2.* \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
RUN pip install prefect-client
That won't have support for EXTRA_PIP_PACKAGES
, but it should be enough to test out if prefect-client
will work for executing your flows.
@desertaxle thanks for the reply. If I create a docker container with Line 38 in 1262485
Maybe there is a deployment command override which works with |
Ah, that's right, we ship
import asycnio
import os
from uuid import UUID
from prefect.runner import Runner
async def main():
environ_flow_id = os.environ.get("PREFECT__FLOW_RUN_ID")
id = UUID(environ_flow_id)
await Runner().execute_flow_run(id)
if __name__ == "__main__":
asyncio.run(main()) That script is essentially what |
Thanks @desertaxle I'm have a go at that approach. I had miss-read prefect/src/prefect/runner/runner.py Line 148 in eb9d51f
What do you think about adding: if __name__ == "__main__":
environ_flow_id = os.environ.get("PREFECT__FLOW_RUN_ID")
id = UUID(environ_flow_id)
await Runner().execute_flow_run(id) to |
@nparley adding that sounds like a great way to enable using |
I'll create a new PR probably cleaner, will reference this one when done. |
This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment. |
Checklist
<link to issue>
"mint.json
.This pull request enables prefect flows to be run in a different python environment to the prefect execution. The reason for this is that prefect is relativity heavy with it's requirements and we don't want to have to mix prefect's requirements with our own. For example our code is not running SQA v2, we should upgrade, but we don't want to adopt prefect and then be beholden to prefects dependencies upgrade cycle. Even though
prefect-client
exists, when using the prefect base docker image or building a custom image the cli is required for the docker command (flow-run execute?prefect/src/prefect/cli/flow_run.py
Line 344 in 1262485
prefect-client
we are installing our flow requirements into a docker image with full prefect. I could have missed something here?The flow is run in a subprocess using a python module command, and the prefect engine does not need (I think?) full prefect but only prefect client to work. Currently the subprocess is spawn using
sys.executable
but this POC allows the over-writing ofget_sys_executable
with an environmental variable. This means that we can make a Dockerfile like:The flow sub process will run in the poetry virtual environment while the default python in the docker container container does not have to mix with the requirements of the flow code.
If it's thought this is not a completely crazy idea this PR could be made mergeable. I assume that the variable coming via job_variables might be better etc?