[BUG]: service daemon is not running #2369

Ga0512 · 2024-01-27T16:32:09Z

Contact Details [Optional]

System Information

RuntimeError: Failed to start service MLFlowDeploymentService[e55e97f5-1fc7-49ac-9158-5de4e1e1a81d] (type: model-serving, flavor: mlflow)
Administrative state: active
Operational state: inactive
Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file:
C:\Users\edney\AppData\Roaming\zenml\local_stores\3e2793a0-8446-4b32-9980-89ace8642081\e55e97f5-1fc7-49ac-9158-5de4e1e1a81d\service.log

What happened?

Hi!

I'm trying to deploy my model using MLFlowDeploymentService from zenml.integrations.mlflow.services but i'm getting this error message:

RuntimeError: Failed to start service MLFlowDeploymentService[e55e97f5-1fc7-49ac-9158-5de4e1e1a81d] (type: model-serving, flavor: mlflow)
Administrative state: active
Operational state: inactive
Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file:
C:\Users\edney\AppData\Roaming\zenml\local_stores\3e2793a0-8446-4b32-9980-89ace8642081\e55e97f5-1fc7-49ac-9158-5de4e1e1a81d\service.log

*Nothing in service.log

Relevant log output

RuntimeError: Failed to start service MLFlowDeploymentService[e55e97f5-1fc7-49ac-9158-5de4e1e1a81d] (type: model-serving, flavor: mlflow)
Administrative state: active
Operational state: inactive
Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file:
C:\Users\edney\AppData\Roaming\zenml\local_stores\3e2793a0-8446-4b32-9980-89ace8642081\e55e97f5-1fc7-49ac-9158-5de4e1e1a81d\service.log

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

htahir1 · 2024-01-27T16:35:35Z

@Ga0512 Try setting the env variable ZENML_LOGGING_VERBOSITY=DEBUG and see if you can get more insights.

Also can you tell us what python verison, zenml version, mlflow version, and OS you're on? Also some code to replicate would be nice.

Thanks!

Ga0512 · 2024-01-27T20:54:18Z

@Ga0512 Try setting the env variable ZENML_LOGGING_VERBOSITY=DEBUG and see if you can get more insights.

Also can you tell us what python verison, zenml version, mlflow version, and OS you're on? Also some code to replicate would be nice.

Thanks!

Hey!

Python 3.11.3
Zenml 0.54.0
Mlflow 2.92
Windowns 10

The file I run is this, in this case, I run python run_deployment.py --config deploy, this inside an environment variable:

from typing import cast

import click
from pipelines.deployment_pipeline import (
    continuous_deployment_pipeline,
    inference_pipeline,
)
from rich import print
from zenml.integrations.mlflow.mlflow_utils import get_tracking_uri
from zenml.integrations.mlflow.model_deployers.mlflow_model_deployer import (
    MLFlowModelDeployer,
)
from zenml.integrations.mlflow.services import MLFlowDeploymentService

DEPLOY = "deploy"
PREDICT = "predict"
DEPLOY_AND_PREDICT = "deploy_and_predict"


@click.command()
@click.option(
    "--config",
    "-c",
    type=click.Choice([DEPLOY, PREDICT, DEPLOY_AND_PREDICT]),
    default=DEPLOY_AND_PREDICT,
    help="Optionally you can choose to only run the deployment "
    "pipeline to train and deploy a model (`deploy`), or to "
    "only run a prediction against the deployed model "
    "(`predict`). By default both will be run "
    "(`deploy_and_predict`).",
)
@click.option(
    "--min-accuracy",
    default=0.92,
    help="Minimum accuracy required to deploy the model",
)
def main(config: str, min_accuracy: float):
    """Run the MLflow example pipeline."""
    # get the MLflow model deployer stack component
    mlflow_model_deployer_component = MLFlowModelDeployer.get_active_model_deployer()
    deploy = config == DEPLOY or config == DEPLOY_AND_PREDICT
    predict = config == PREDICT or config == DEPLOY_AND_PREDICT

    if deploy:
        # Initialize a continuous deployment pipeline run
        continuous_deployment_pipeline(
            data_path = './data/olist_customers_dataset.csv',
            min_accuracy=min_accuracy,
            workers=3,
            timeout=60,
        )

    if predict:
        # Initialize an inference pipeline run
        inference_pipeline(
            pipeline_name="continuous_deployment_pipeline",
            pipeline_step_name="mlflow_model_deployer_step",
        )

    print(
        "You can run:\n "
        f"[italic green]    mlflow ui --backend-store-uri '{get_tracking_uri()}"
        "[/italic green]\n ...to inspect your experiment runs within the MLflow"
        " UI.\nYou can find your runs tracked within the "
        "`mlflow_example_pipeline` experiment. There you'll also be able to "
        "compare two or more runs.\n\n"
    )

    # fetch existing services with same pipeline name, step name and model name
    existing_services = mlflow_model_deployer_component.find_model_server(
        pipeline_name="continuous_deployment_pipeline",
        pipeline_step_name="mlflow_model_deployer_step",
        model_name="model",
    )

    if existing_services:
        service = cast(MLFlowDeploymentService, existing_services[0])
        if service.is_running:
            print(
                f"The MLflow prediction server is running locally as a daemon "
                f"process service and accepts inference requests at:\n"
                f"    {service.prediction_url}\n"
                f"To stop the service, run "
                f"[italic green]`zenml model-deployer models delete "
                f"{str(service.uuid)}`[/italic green]."
            )
        elif service.is_failed:
            print(
                f"The MLflow prediction server is in a failed state:\n"
                f" Last state: '{service.status.state.value}'\n"
                f" Last error: '{service.status.last_error}'"
            )
    else:
        print(
            "No MLflow prediction server is currently running. The deployment "
            "pipeline must run first to train a model and deploy it. Execute "
            "the same command with the `--deploy` argument to deploy a model."
        )


if __name__ == "__main__":
    main()

The pipelines.deployment module:

import json

# from .utils import get_data_for_test
import os

import numpy as np
import pandas as pd

from steps.clean_data import clean_df
from steps.evaluation import evaluate_model
from steps.ingest_data import ingest_df
from steps.model_train import train_model
from zenml import pipeline, step
from zenml.config import DockerSettings
from zenml.constants import DEFAULT_SERVICE_START_STOP_TIMEOUT
from zenml.integrations.constants import MLFLOW, TENSORFLOW
from zenml.integrations.mlflow.model_deployers.mlflow_model_deployer import (
    MLFlowModelDeployer,
)
from zenml.integrations.mlflow.services import MLFlowDeploymentService
from zenml.integrations.mlflow.steps import mlflow_model_deployer_step
from zenml.steps import BaseParameters, Output

from .utils import get_data_for_test

docker_settings = DockerSettings(required_integrations=[MLFLOW])
import pandas as pd

# import os


# from zenml.integrations.mlflow.model_deployers.mlflow_model_deployer import (
#     MLFlowModelDeployer,
# )
# from zenml.integrations.mlflow.services import MLFlowDeploymentService
# from zenml.pipelines import pipeline
# from zenml.steps import BaseParameters, Output, step


requirements_file = os.path.join(os.path.dirname(__file__), "requirements.txt")


@step(enable_cache=False)
def dynamic_importer() -> str:
    """Downloads the latest data from a mock API."""
    data = get_data_for_test()
    return data


class DeploymentTriggerConfig(BaseParameters):
    """Parameters that are used to trigger the deployment"""

    min_accuracy: float = 0.9


@step
def deployment_trigger(
    accuracy: float,
    config: DeploymentTriggerConfig,
) -> bool:
    """Implements a simple model deployment trigger that looks at the
    input model accuracy and decides if it is good enough to deploy"""

    return accuracy > config.min_accuracy


class MLFlowDeploymentLoaderStepParameters(BaseParameters):
    """MLflow deployment getter parameters

    Attributes:
        pipeline_name: name of the pipeline that deployed the MLflow prediction
            server
        step_name: the name of the step that deployed the MLflow prediction
            server
        running: when this flag is set, the step only returns a running service
        model_name: the name of the model that is deployed
    """

    pipeline_name: str
    step_name: str
    running: bool = True


@step(enable_cache=False)
def prediction_service_loader(
    pipeline_name: str,
    pipeline_step_name: str,
    running: bool = True,
    model_name: str = "model",
) -> MLFlowDeploymentService:
    """Get the prediction service started by the deployment pipeline.

    Args:
        pipeline_name: name of the pipeline that deployed the MLflow prediction
            server
        step_name: the name of the step that deployed the MLflow prediction
            server
        running: when this flag is set, the step only returns a running service
        model_name: the name of the model that is deployed
    """
    # get the MLflow model deployer stack component
    model_deployer = MLFlowModelDeployer.get_active_model_deployer()

    # fetch existing services with same pipeline name, step name and model name
    existing_services = model_deployer.find_model_server(
        pipeline_name=pipeline_name,
        pipeline_step_name=pipeline_step_name,
        model_name=model_name,
        running=running,
    )

    if not existing_services:
        raise RuntimeError(
            f"No MLflow prediction service deployed by the "
            f"{pipeline_step_name} step in the {pipeline_name} "
            f"pipeline for the '{model_name}' model is currently "
            f"running."
        )
    print(existing_services)
    print(type(existing_services))
    return existing_services[0]


@step
def predictor(
    service: MLFlowDeploymentService,
    data: np.ndarray,
) -> np.ndarray:
    """Run an inference request against a prediction service"""

    service.start(timeout=10)  # should be a NOP if already started
    data = json.loads(data)
    data.pop("columns")
    data.pop("index")
    columns_for_df = [
        "payment_sequential",
        "payment_installments",
        "payment_value",
        "price",
        "freight_value",
        "product_name_lenght",
        "product_description_lenght",
        "product_photos_qty",
        "product_weight_g",
        "product_length_cm",
        "product_height_cm",
        "product_width_cm",
    ]
    df = pd.DataFrame(data["data"], columns=columns_for_df)
    json_list = json.loads(json.dumps(list(df.T.to_dict().values())))
    data = np.array(json_list)
    prediction = service.predict(data)
    return prediction


@step
def predictor(
    service: MLFlowDeploymentService,
    data: str,
) -> np.ndarray:
    """Run an inference request against a prediction service"""

    service.start(timeout=10)  # should be a NOP if already started
    data = json.loads(data)
    data.pop("columns")
    data.pop("index")
    columns_for_df = [
        "payment_sequential",
        "payment_installments",
        "payment_value",
        "price",
        "freight_value",
        "product_name_lenght",
        "product_description_lenght",
        "product_photos_qty",
        "product_weight_g",
        "product_length_cm",
        "product_height_cm",
        "product_width_cm",
    ]
    df = pd.DataFrame(data["data"], columns=columns_for_df)
    json_list = json.loads(json.dumps(list(df.T.to_dict().values())))
    data = np.array(json_list)
    prediction = service.predict(data)
    return prediction


@pipeline(enable_cache=True, settings={"docker": docker_settings})
def continuous_deployment_pipeline(
    data_path: str,
    min_accuracy: float = 0.9,
    workers: int = 1,
    timeout: int = DEFAULT_SERVICE_START_STOP_TIMEOUT,
):
    # Link all the steps artifacts together
    df = ingest_df(data_path=data_path)
    x_train, x_test, y_train, y_test = clean_df(df)
    model = train_model(x_train, x_test, y_train, y_test)
    mse, rmse = evaluate_model(model, x_test, y_test)
    deployment_decision = deployment_trigger(accuracy=mse)
    mlflow_model_deployer_step(
        model=model,
        deploy_decision=deployment_decision,
        workers=workers,
        timeout=timeout,
    )


@pipeline(enable_cache=False, settings={"docker": docker_settings})
def inference_pipeline(pipeline_name: str, pipeline_step_name: str):
    # Link all the steps artifacts together
    batch_data = dynamic_importer()
    model_deployment_service = prediction_service_loader(
        pipeline_name=pipeline_name,
        pipeline_step_name=pipeline_step_name,
        running=False,
    )
    predictor(service=model_deployment_service, data=batch_data)

safoinme · 2024-02-20T14:35:30Z

@Ga0512 Unfortunately MLflow deployment isn't supported yet for windows

Luismbpr · 2024-03-13T21:08:42Z

I have the same issue but I am running this on a Mac OS. Is there any solution to this?

Python version 3.9.18

Package Version

catboost 1.0.5
MarkupSafe 2.1.5
mlflow 2.10.2
mlserver 1.5.0
mlserver-mlflow 1.5.0
numpy 1.26.4
optuna 2.10.0
pydantic 1.10.14
scikit-learn 1.4.1.post1
streamlit 1.32.1
tqdm 4.66.2
zenml 0.55.5

Ga0512 · 2024-03-14T01:22:43Z

I have the same issue but I am running this on a Mac OS. Is there any solution to this?

Python version 3.9.18

Package Version

catboost 1.0.5 MarkupSafe 2.1.5 mlflow 2.10.2 mlserver 1.5.0 mlserver-mlflow 1.5.0 numpy 1.26.4 optuna 2.10.0 pydantic 1.10.14 scikit-learn 1.4.1.post1 streamlit 1.32.1 tqdm 4.66.2 zenml 0.55.5

Are you doing the Freecodecamp MLOps course? (https://www.youtube.com/watch?v=-dJPoLm_gtE) He used a Mac throughout the course and managed to overcome this problem.

Luismbpr · 2024-03-14T02:03:16Z

Are you doing the Freecodecamp MLOps course?

I am doing that course indeed and tried many times to solve it but I still cannot manage to do it. I might have not understood something he did but I think I did everything he did and yet cannot deploy it.

strickvl · 2024-03-14T15:24:55Z

Could you try replacing the requirements.txt file contents with this:

catboost==1.0.4
joblib>=1.1.0
lightgbm==4.1.0
optuna==2.10.0
streamlit==1.29.0
xgboost==2.0.3
markupsafe==1.1.1
zenml>=0.52.0
scikit-learn>=1.3.2
altair

Then reinstall the packages (pip install -r requirements.txt etc in a fresh env), then zenml disconnect and zenml down and then try zenml up again?

Luismbpr · 2024-03-15T05:08:30Z

Could you try replacing the requirements.txt file contents with this:
catboost==1.0.4
joblib>=1.1.0
lightgbm==4.1.0
optuna==2.10.0
streamlit==1.29.0
xgboost==2.0.3
markupsafe==1.1.1
zenml>=0.52.0
scikit-learn>=1.3.2
altair
Then reinstall the packages (pip install -r requirements.txt etc in a fresh env), then zenml disconnect and zenml down and then try zenml up again?

Hello. First of all thank you for replying.

1.1) I did try to install those versions (first by bash pip install -r requirements.txt) and did not work.
1.2) Then tried installing one by one and also could not do it. Pip installer did not let me install those versions

—————
Python == 3.9.18 -> Seems to be working

mlflow == 2.10.2
mlserver == 1.5.0
mlserver-mlflow == 1.5.0
MarkupSafe == 2.1.5
numpy == 1.26.4
pandas == 2.2.1
scikit-learn == 1.4.1.post1
tqdm == 4.66.2
zenml == 0.55.5

—————

I did the zenml disconnect, zenml down, zenml up many times and never got it to work.
Tried creating different stacks, experiment-trackers, model-deployers and set them up to be the ones working. Tried this many times
Something that seemed to work but not entirely sure was using those two pieces of code on the
https://stackoverflow.com/questions/52671926/rails-may-have-been-in-progress-in-another-thread-when-fork-was-called

Was appending these two lines of code on the .zshrc file

% vim ~/.zshrc 
appending those two lines of code:

 ## for MLOPS deployment
export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

% source ~/.zshrc

Then creating a new stack, experiment-tracker, model-deployer and setting them.

I am still not sure what was the piece that made it work. I have not finished the course (almost done now) but so far it seems to be working, or at least not displaying any errors.

Note: I found that stackoverflow post since the zenml logs were giving me a similar error to what one of the users from that post was having

This was a copy from that Stack Overflow post:
bjc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.

Side Note:

Posted on:
Error in model deployment ayush714/customer-satisfaction-mlops#2
Error in Model Deployment ayush714/mlops-projects-course#2
[BUG]: service daemon is not running #2369

strickvl · 2024-03-15T07:56:27Z

Correct. That's also something we do in our CI to allow things to work on some Mac environments. I'll add something to our docs to that effect. It seems like we should make that clear.

…

On Fri, 15 Mar 2024 at 06:08, Luismbpr ***@***.***> wrote: Could you try replacing the requirements.txt file contents with this: catboost==1.0.4 joblib>=1.1.0 lightgbm==4.1.0 optuna==2.10.0 streamlit==1.29.0 xgboost==2.0.3 markupsafe==1.1.1 zenml>=0.52.0 scikit-learn>=1.3.2 altair Then reinstall the packages (pip install -r requirements.txt etc in a fresh env), then zenml disconnect and zenml down and then try zenml up again? Hello. First of all thank you for replying. 1.1) I did try to install those versions (first by bash pip install -r requirements.txt) and did not work. 1.2) Then tried installing one by one and also could not do it. Pip installer did not let me install those versions 1. I did the zenml disconnect, zenml down, zenml up many times and never got it to work. 2. Tried creating different stacks, experiment-trackers, model-deployers and set them up to be the ones working. Tried this many times 3. Something that seemed to work but not entirely sure was using those two pieces of code on the https://stackoverflow.com/questions/52671926/rails-may-have-been-in-progress-in-another-thread-when-fork-was-called Was appending these two lines of code on the .zshrc file % vim ~/.zshrc appending those two lines of code:```Markdown ## for MLOPS deploymentexport DISABLE_SPRING=trueexport OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES % source ~/.zshrc Then creating a new stack, experiment-tracker, model-deployer and setting them. I am still not sure what was the piece that made it work. I have not finished the course (almost done now) but so far it seems to be working, or at least not displaying any errors. — Reply to this email directly, view it on GitHub <#2369 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZRNJQXSNYQFA7T3ONDKW3YYJ66JAVCNFSM6AAAAABCNPBI7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJYHE2DIOBXGU> . You are receiving this because you commented.Message ID: ***@***.***>

Luismbpr · 2024-03-15T16:46:07Z

Correct. That's also something we do in our CI to allow things to work on some Mac environments. I'll add something to our docs to that effect. It seems like we should make that clear.

Thank you. That would be really helpful.

Just a question now that that seemed to be the solution.

 ## for MLOPS deployment
export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Do we need to use both of these? or which one is the one that works?

strickvl · 2024-03-15T19:24:43Z

For Macs, I think export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES is the key one.

Luismbpr · 2024-03-16T22:20:20Z

For Macs, I think export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES is the key one.

Good to know. Thank you for your help.

Ga0512 added the bug Something isn't working label Jan 27, 2024

strickvl assigned safoinme Feb 5, 2024

safoinme closed this as completed Feb 20, 2024

This was referenced Mar 15, 2024

Error in Model Deployment ayush714/mlops-projects-course#2

Open

Error in model deployment ayush714/customer-satisfaction-mlops#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: service daemon is not running #2369

[BUG]: service daemon is not running #2369

Ga0512 commented Jan 27, 2024 •

edited

Loading

htahir1 commented Jan 27, 2024

Ga0512 commented Jan 27, 2024 •

edited

Loading

safoinme commented Feb 20, 2024

Luismbpr commented Mar 13, 2024 •

edited

Loading

Ga0512 commented Mar 14, 2024

Luismbpr commented Mar 14, 2024

strickvl commented Mar 14, 2024

Luismbpr commented Mar 15, 2024 •

edited

Loading

strickvl commented Mar 15, 2024 via email

Luismbpr commented Mar 15, 2024

strickvl commented Mar 15, 2024

Luismbpr commented Mar 16, 2024

[BUG]: service daemon is not running #2369

[BUG]: service daemon is not running #2369

Comments

Ga0512 commented Jan 27, 2024 • edited Loading

Contact Details [Optional]

System Information

What happened?

Relevant log output

Code of Conduct

htahir1 commented Jan 27, 2024

Ga0512 commented Jan 27, 2024 • edited Loading

safoinme commented Feb 20, 2024

Luismbpr commented Mar 13, 2024 • edited Loading

Ga0512 commented Mar 14, 2024

Luismbpr commented Mar 14, 2024

strickvl commented Mar 14, 2024

Luismbpr commented Mar 15, 2024 • edited Loading

strickvl commented Mar 15, 2024 via email

Luismbpr commented Mar 15, 2024

strickvl commented Mar 15, 2024

Luismbpr commented Mar 16, 2024

Ga0512 commented Jan 27, 2024 •

edited

Loading

Ga0512 commented Jan 27, 2024 •

edited

Loading

Luismbpr commented Mar 13, 2024 •

edited

Loading

Luismbpr commented Mar 15, 2024 •

edited

Loading