Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable virtual environment execution for docker - poc #17163

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nparley
Copy link

@nparley nparley commented Feb 17, 2025

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • If this pull request adds new functionality, it includes unit tests that cover the changes
  • If this pull request removes docs files, it includes redirect settings in mint.json.
  • If this pull request adds functions or classes, it includes helpful docstrings.

This pull request enables prefect flows to be run in a different python environment to the prefect execution. The reason for this is that prefect is relativity heavy with it's requirements and we don't want to have to mix prefect's requirements with our own. For example our code is not running SQA v2, we should upgrade, but we don't want to adopt prefect and then be beholden to prefects dependencies upgrade cycle. Even though prefect-client exists, when using the prefect base docker image or building a custom image the cli is required for the docker command (flow-run execute?

async def execute(
). Therefore even if we use prefect-client we are installing our flow requirements into a docker image with full prefect. I could have missed something here?

The flow is run in a subprocess using a python module command, and the prefect engine does not need (I think?) full prefect but only prefect client to work. Currently the subprocess is spawn using sys.executable but this POC allows the over-writing of get_sys_executable with an environmental variable. This means that we can make a Dockerfile like:

FROM prefecthq/prefect:3.1.7-python3.11
COPY . /opt/prefect/prefect_demo/
WORKDIR /opt/prefect/prefect_demo/
# Install poetry
RUN curl -sSL https://install.python-poetry.org | python3 - \
    && poetry --version \
    && poetry config virtualenvs.in-project true
RUN poetry install --no-interaction --no-ansi -vv
ENV EXECUTABLE_PATH=/opt/prefect/prefect_demo/.venv/bin/python
RUN prefect version

The flow sub process will run in the poetry virtual environment while the default python in the docker container container does not have to mix with the requirements of the flow code.

If it's thought this is not a completely crazy idea this PR could be made mergeable. I assume that the variable coming via job_variables might be better etc?

Copy link

codspeed-hq bot commented Feb 17, 2025

CodSpeed Performance Report

Merging #17163 will not alter performance

Comparing nparley:enable_virtual_environment_execution (6aca084) with main (1262485)

Summary

✅ 2 untouched benchmarks

Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nparley! Thanks for raising your issues with the scope of prefect's dependencies and putting together a POC for a solution!

You're 100% correct that prefect-client is sufficient for executing scheduled flow runs if you're running a separate Prefect server or using Prefect Cloud. I think we could improve this situation by publishing a prefect-client image alongside our currently published images.

I can look into creating a more lightweight Docker image for broader consumption, but if you want to build your own version, a Dockerfile like this should work:

FROM python:3.11-slim

WORKDIR /opt/prefect

RUN apt-get update && \
    apt-get install --no-install-recommends -y \
    gpg \
    git=1:2.* \
    && apt-get clean && rm -rf /var/lib/apt/lists/*
    
RUN pip install prefect-client

That won't have support for EXTRA_PIP_PACKAGES, but it should be enough to test out if prefect-client will work for executing your flows.

@nparley
Copy link
Author

nparley commented Feb 17, 2025

@desertaxle thanks for the reply. If I create a docker container with prefect-client the k8s pod fails with a 128 error because the cli command it's trying to execute is not there. Pretty sure the default docker command is prefect flow-run execute {uuid} which of course is the cli and not part of the the client. e.g.

subprocess.check_call(

Maybe there is a deployment command override which works with prefect-client?

@desertaxle
Copy link
Member

Ah, that's right, we ship prefect-client without a CLI. We'll need to add an alternative to prefect flow-run execute. In the meantime you can do one of two things:

  1. Use python -m prefect.engine as the command. Some features like flow run heartbeats won't work with this command and cancellation and crashed hooks won't work as well, but it'll work otherwise.
  2. Add this script to your image and run it on start up with something like python execute_flow_run.py:
import asycnio
import os
from uuid import UUID

from prefect.runner import Runner

async def main():
    environ_flow_id = os.environ.get("PREFECT__FLOW_RUN_ID")
    id = UUID(environ_flow_id)

    await Runner().execute_flow_run(id)

if __name__ == "__main__":
    asyncio.run(main())

That script is essentially what prefect flow-run execute does.

@nparley
Copy link
Author

nparley commented Feb 19, 2025

Thanks @desertaxle I'm have a go at that approach. I had miss-read

class Runner:
and thought it was importing the database schemas so was requiring the full library but it's actually only importing the client pydantic schema.

What do you think about adding:

if __name__ == "__main__":
    environ_flow_id = os.environ.get("PREFECT__FLOW_RUN_ID")
    id = UUID(environ_flow_id)

    await Runner().execute_flow_run(id)

to runner.py so could do something like python -m prefect.runner?

@desertaxle
Copy link
Member

@nparley adding that sounds like a great way to enable using prefect-client to kick off scheduled flow runs. Do you want to update this PR to make that change or submit a new PR?

@nparley
Copy link
Author

nparley commented Feb 21, 2025

I'll create a new PR probably cleaner, will reference this one when done.

Copy link
Contributor

github-actions bot commented Mar 7, 2025

This pull request is stale because it has been open 14 days with no activity. To keep this pull request open remove stale label or comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants