Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: service daemon is not running #2369

Closed
1 task done
Ga0512 opened this issue Jan 27, 2024 · 12 comments
Closed
1 task done

[BUG]: service daemon is not running #2369

Ga0512 opened this issue Jan 27, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@Ga0512
Copy link

Ga0512 commented Jan 27, 2024

Contact Details [Optional]

[email protected]

System Information

RuntimeError: Failed to start service MLFlowDeploymentService[e55e97f5-1fc7-49ac-9158-5de4e1e1a81d] (type: model-serving, flavor: mlflow)
Administrative state: active
Operational state: inactive
Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file:
C:\Users\edney\AppData\Roaming\zenml\local_stores\3e2793a0-8446-4b32-9980-89ace8642081\e55e97f5-1fc7-49ac-9158-5de4e1e1a81d\service.log

What happened?

Hi!

I'm trying to deploy my model using MLFlowDeploymentService from zenml.integrations.mlflow.services but i'm getting this error message:

RuntimeError: Failed to start service MLFlowDeploymentService[e55e97f5-1fc7-49ac-9158-5de4e1e1a81d] (type: model-serving, flavor: mlflow)
Administrative state: active
Operational state: inactive
Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file:
C:\Users\edney\AppData\Roaming\zenml\local_stores\3e2793a0-8446-4b32-9980-89ace8642081\e55e97f5-1fc7-49ac-9158-5de4e1e1a81d\service.log

*Nothing in service.log

Relevant log output

RuntimeError: Failed to start service MLFlowDeploymentService[e55e97f5-1fc7-49ac-9158-5de4e1e1a81d] (type: model-serving, flavor: mlflow)
Administrative state: active
Operational state: inactive
Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file:
C:\Users\edney\AppData\Roaming\zenml\local_stores\3e2793a0-8446-4b32-9980-89ace8642081\e55e97f5-1fc7-49ac-9158-5de4e1e1a81d\service.log

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Ga0512 Ga0512 added the bug Something isn't working label Jan 27, 2024
@htahir1
Copy link
Contributor

htahir1 commented Jan 27, 2024

@Ga0512 Try setting the env variable ZENML_LOGGING_VERBOSITY=DEBUG and see if you can get more insights.

Also can you tell us what python verison, zenml version, mlflow version, and OS you're on? Also some code to replicate would be nice.

Thanks!

@Ga0512
Copy link
Author

Ga0512 commented Jan 27, 2024

@Ga0512 Try setting the env variable ZENML_LOGGING_VERBOSITY=DEBUG and see if you can get more insights.

Also can you tell us what python verison, zenml version, mlflow version, and OS you're on? Also some code to replicate would be nice.

Thanks!

Hey!

Python 3.11.3
Zenml 0.54.0
Mlflow 2.92
Windowns 10

The file I run is this, in this case, I run python run_deployment.py --config deploy, this inside an environment variable:

from typing import cast

import click
from pipelines.deployment_pipeline import (
    continuous_deployment_pipeline,
    inference_pipeline,
)
from rich import print
from zenml.integrations.mlflow.mlflow_utils import get_tracking_uri
from zenml.integrations.mlflow.model_deployers.mlflow_model_deployer import (
    MLFlowModelDeployer,
)
from zenml.integrations.mlflow.services import MLFlowDeploymentService

DEPLOY = "deploy"
PREDICT = "predict"
DEPLOY_AND_PREDICT = "deploy_and_predict"


@click.command()
@click.option(
    "--config",
    "-c",
    type=click.Choice([DEPLOY, PREDICT, DEPLOY_AND_PREDICT]),
    default=DEPLOY_AND_PREDICT,
    help="Optionally you can choose to only run the deployment "
    "pipeline to train and deploy a model (`deploy`), or to "
    "only run a prediction against the deployed model "
    "(`predict`). By default both will be run "
    "(`deploy_and_predict`).",
)
@click.option(
    "--min-accuracy",
    default=0.92,
    help="Minimum accuracy required to deploy the model",
)
def main(config: str, min_accuracy: float):
    """Run the MLflow example pipeline."""
    # get the MLflow model deployer stack component
    mlflow_model_deployer_component = MLFlowModelDeployer.get_active_model_deployer()
    deploy = config == DEPLOY or config == DEPLOY_AND_PREDICT
    predict = config == PREDICT or config == DEPLOY_AND_PREDICT

    if deploy:
        # Initialize a continuous deployment pipeline run
        continuous_deployment_pipeline(
            data_path = './data/olist_customers_dataset.csv',
            min_accuracy=min_accuracy,
            workers=3,
            timeout=60,
        )

    if predict:
        # Initialize an inference pipeline run
        inference_pipeline(
            pipeline_name="continuous_deployment_pipeline",
            pipeline_step_name="mlflow_model_deployer_step",
        )

    print(
        "You can run:\n "
        f"[italic green]    mlflow ui --backend-store-uri '{get_tracking_uri()}"
        "[/italic green]\n ...to inspect your experiment runs within the MLflow"
        " UI.\nYou can find your runs tracked within the "
        "`mlflow_example_pipeline` experiment. There you'll also be able to "
        "compare two or more runs.\n\n"
    )

    # fetch existing services with same pipeline name, step name and model name
    existing_services = mlflow_model_deployer_component.find_model_server(
        pipeline_name="continuous_deployment_pipeline",
        pipeline_step_name="mlflow_model_deployer_step",
        model_name="model",
    )

    if existing_services:
        service = cast(MLFlowDeploymentService, existing_services[0])
        if service.is_running:
            print(
                f"The MLflow prediction server is running locally as a daemon "
                f"process service and accepts inference requests at:\n"
                f"    {service.prediction_url}\n"
                f"To stop the service, run "
                f"[italic green]`zenml model-deployer models delete "
                f"{str(service.uuid)}`[/italic green]."
            )
        elif service.is_failed:
            print(
                f"The MLflow prediction server is in a failed state:\n"
                f" Last state: '{service.status.state.value}'\n"
                f" Last error: '{service.status.last_error}'"
            )
    else:
        print(
            "No MLflow prediction server is currently running. The deployment "
            "pipeline must run first to train a model and deploy it. Execute "
            "the same command with the `--deploy` argument to deploy a model."
        )


if __name__ == "__main__":
    main()

The pipelines.deployment module:

import json

# from .utils import get_data_for_test
import os

import numpy as np
import pandas as pd

from steps.clean_data import clean_df
from steps.evaluation import evaluate_model
from steps.ingest_data import ingest_df
from steps.model_train import train_model
from zenml import pipeline, step
from zenml.config import DockerSettings
from zenml.constants import DEFAULT_SERVICE_START_STOP_TIMEOUT
from zenml.integrations.constants import MLFLOW, TENSORFLOW
from zenml.integrations.mlflow.model_deployers.mlflow_model_deployer import (
    MLFlowModelDeployer,
)
from zenml.integrations.mlflow.services import MLFlowDeploymentService
from zenml.integrations.mlflow.steps import mlflow_model_deployer_step
from zenml.steps import BaseParameters, Output

from .utils import get_data_for_test

docker_settings = DockerSettings(required_integrations=[MLFLOW])
import pandas as pd

# import os


# from zenml.integrations.mlflow.model_deployers.mlflow_model_deployer import (
#     MLFlowModelDeployer,
# )
# from zenml.integrations.mlflow.services import MLFlowDeploymentService
# from zenml.pipelines import pipeline
# from zenml.steps import BaseParameters, Output, step


requirements_file = os.path.join(os.path.dirname(__file__), "requirements.txt")


@step(enable_cache=False)
def dynamic_importer() -> str:
    """Downloads the latest data from a mock API."""
    data = get_data_for_test()
    return data


class DeploymentTriggerConfig(BaseParameters):
    """Parameters that are used to trigger the deployment"""

    min_accuracy: float = 0.9


@step
def deployment_trigger(
    accuracy: float,
    config: DeploymentTriggerConfig,
) -> bool:
    """Implements a simple model deployment trigger that looks at the
    input model accuracy and decides if it is good enough to deploy"""

    return accuracy > config.min_accuracy


class MLFlowDeploymentLoaderStepParameters(BaseParameters):
    """MLflow deployment getter parameters

    Attributes:
        pipeline_name: name of the pipeline that deployed the MLflow prediction
            server
        step_name: the name of the step that deployed the MLflow prediction
            server
        running: when this flag is set, the step only returns a running service
        model_name: the name of the model that is deployed
    """

    pipeline_name: str
    step_name: str
    running: bool = True


@step(enable_cache=False)
def prediction_service_loader(
    pipeline_name: str,
    pipeline_step_name: str,
    running: bool = True,
    model_name: str = "model",
) -> MLFlowDeploymentService:
    """Get the prediction service started by the deployment pipeline.

    Args:
        pipeline_name: name of the pipeline that deployed the MLflow prediction
            server
        step_name: the name of the step that deployed the MLflow prediction
            server
        running: when this flag is set, the step only returns a running service
        model_name: the name of the model that is deployed
    """
    # get the MLflow model deployer stack component
    model_deployer = MLFlowModelDeployer.get_active_model_deployer()

    # fetch existing services with same pipeline name, step name and model name
    existing_services = model_deployer.find_model_server(
        pipeline_name=pipeline_name,
        pipeline_step_name=pipeline_step_name,
        model_name=model_name,
        running=running,
    )

    if not existing_services:
        raise RuntimeError(
            f"No MLflow prediction service deployed by the "
            f"{pipeline_step_name} step in the {pipeline_name} "
            f"pipeline for the '{model_name}' model is currently "
            f"running."
        )
    print(existing_services)
    print(type(existing_services))
    return existing_services[0]


@step
def predictor(
    service: MLFlowDeploymentService,
    data: np.ndarray,
) -> np.ndarray:
    """Run an inference request against a prediction service"""

    service.start(timeout=10)  # should be a NOP if already started
    data = json.loads(data)
    data.pop("columns")
    data.pop("index")
    columns_for_df = [
        "payment_sequential",
        "payment_installments",
        "payment_value",
        "price",
        "freight_value",
        "product_name_lenght",
        "product_description_lenght",
        "product_photos_qty",
        "product_weight_g",
        "product_length_cm",
        "product_height_cm",
        "product_width_cm",
    ]
    df = pd.DataFrame(data["data"], columns=columns_for_df)
    json_list = json.loads(json.dumps(list(df.T.to_dict().values())))
    data = np.array(json_list)
    prediction = service.predict(data)
    return prediction


@step
def predictor(
    service: MLFlowDeploymentService,
    data: str,
) -> np.ndarray:
    """Run an inference request against a prediction service"""

    service.start(timeout=10)  # should be a NOP if already started
    data = json.loads(data)
    data.pop("columns")
    data.pop("index")
    columns_for_df = [
        "payment_sequential",
        "payment_installments",
        "payment_value",
        "price",
        "freight_value",
        "product_name_lenght",
        "product_description_lenght",
        "product_photos_qty",
        "product_weight_g",
        "product_length_cm",
        "product_height_cm",
        "product_width_cm",
    ]
    df = pd.DataFrame(data["data"], columns=columns_for_df)
    json_list = json.loads(json.dumps(list(df.T.to_dict().values())))
    data = np.array(json_list)
    prediction = service.predict(data)
    return prediction


@pipeline(enable_cache=True, settings={"docker": docker_settings})
def continuous_deployment_pipeline(
    data_path: str,
    min_accuracy: float = 0.9,
    workers: int = 1,
    timeout: int = DEFAULT_SERVICE_START_STOP_TIMEOUT,
):
    # Link all the steps artifacts together
    df = ingest_df(data_path=data_path)
    x_train, x_test, y_train, y_test = clean_df(df)
    model = train_model(x_train, x_test, y_train, y_test)
    mse, rmse = evaluate_model(model, x_test, y_test)
    deployment_decision = deployment_trigger(accuracy=mse)
    mlflow_model_deployer_step(
        model=model,
        deploy_decision=deployment_decision,
        workers=workers,
        timeout=timeout,
    )


@pipeline(enable_cache=False, settings={"docker": docker_settings})
def inference_pipeline(pipeline_name: str, pipeline_step_name: str):
    # Link all the steps artifacts together
    batch_data = dynamic_importer()
    model_deployment_service = prediction_service_loader(
        pipeline_name=pipeline_name,
        pipeline_step_name=pipeline_step_name,
        running=False,
    )
    predictor(service=model_deployment_service, data=batch_data)

@safoinme
Copy link
Contributor

@Ga0512 Unfortunately MLflow deployment isn't supported yet for windows

@Luismbpr
Copy link

Luismbpr commented Mar 13, 2024

I have the same issue but I am running this on a Mac OS. Is there any solution to this?

Python version 3.9.18

Package Version


catboost 1.0.5
MarkupSafe 2.1.5
mlflow 2.10.2
mlserver 1.5.0
mlserver-mlflow 1.5.0
numpy 1.26.4
optuna 2.10.0
pydantic 1.10.14
scikit-learn 1.4.1.post1
streamlit 1.32.1
tqdm 4.66.2
zenml 0.55.5

@Ga0512
Copy link
Author

Ga0512 commented Mar 14, 2024

I have the same issue but I am running this on a Mac OS. Is there any solution to this?

Python version 3.9.18

Package Version

catboost 1.0.5 MarkupSafe 2.1.5 mlflow 2.10.2 mlserver 1.5.0 mlserver-mlflow 1.5.0 numpy 1.26.4 optuna 2.10.0 pydantic 1.10.14 scikit-learn 1.4.1.post1 streamlit 1.32.1 tqdm 4.66.2 zenml 0.55.5

Are you doing the Freecodecamp MLOps course? (https://www.youtube.com/watch?v=-dJPoLm_gtE) He used a Mac throughout the course and managed to overcome this problem.

@Luismbpr
Copy link

Are you doing the Freecodecamp MLOps course?

I am doing that course indeed and tried many times to solve it but I still cannot manage to do it. I might have not understood something he did but I think I did everything he did and yet cannot deploy it.

@strickvl
Copy link
Contributor

Could you try replacing the requirements.txt file contents with this:

catboost==1.0.4
joblib>=1.1.0
lightgbm==4.1.0
optuna==2.10.0
streamlit==1.29.0
xgboost==2.0.3
markupsafe==1.1.1
zenml>=0.52.0
scikit-learn>=1.3.2
altair

Then reinstall the packages (pip install -r requirements.txt etc in a fresh env), then zenml disconnect and zenml down and then try zenml up again?

@Luismbpr
Copy link

Luismbpr commented Mar 15, 2024

Could you try replacing the requirements.txt file contents with this:

catboost==1.0.4
joblib>=1.1.0
lightgbm==4.1.0
optuna==2.10.0
streamlit==1.29.0
xgboost==2.0.3
markupsafe==1.1.1
zenml>=0.52.0
scikit-learn>=1.3.2
altair

Then reinstall the packages (pip install -r requirements.txt etc in a fresh env), then zenml disconnect and zenml down and then try zenml up again?

Hello. First of all thank you for replying.

1.1) I did try to install those versions (first by bash pip install -r requirements.txt) and did not work.
1.2) Then tried installing one by one and also could not do it. Pip installer did not let me install those versions

—————
Python == 3.9.18 -> Seems to be working

mlflow == 2.10.2
mlserver == 1.5.0
mlserver-mlflow == 1.5.0
MarkupSafe == 2.1.5
numpy == 1.26.4
pandas == 2.2.1
scikit-learn == 1.4.1.post1
tqdm == 4.66.2
zenml == 0.55.5

—————

  1. I did the zenml disconnect, zenml down, zenml up many times and never got it to work.

  2. Tried creating different stacks, experiment-trackers, model-deployers and set them up to be the ones working. Tried this many times

  3. Something that seemed to work but not entirely sure was using those two pieces of code on the
    https://stackoverflow.com/questions/52671926/rails-may-have-been-in-progress-in-another-thread-when-fork-was-called

Was appending these two lines of code on the .zshrc file

% vim ~/.zshrc 
appending those two lines of code:
 ## for MLOPS deployment
export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
% source ~/.zshrc

Then creating a new stack, experiment-tracker, model-deployer and setting them.

I am still not sure what was the piece that made it work. I have not finished the course (almost done now) but so far it seems to be working, or at least not displaying any errors.

Note: I found that stackoverflow post since the zenml logs were giving me a similar error to what one of the users from that post was having

This was a copy from that Stack Overflow post:
bjc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.

Side Note:

@strickvl
Copy link
Contributor

strickvl commented Mar 15, 2024 via email

@Luismbpr
Copy link

Correct. That's also something we do in our CI to allow things to work on some Mac environments. I'll add something to our docs to that effect. It seems like we should make that clear.

Thank you. That would be really helpful.

Just a question now that that seemed to be the solution.

 ## for MLOPS deployment
export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Do we need to use both of these? or which one is the one that works?

@strickvl
Copy link
Contributor

For Macs, I think export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES is the key one.

@Luismbpr
Copy link

For Macs, I think export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES is the key one.

Good to know. Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants